Two-Shot SVBRDF Capture for Stationary Materials
Code and data

Return to main project page

About

This is a source code and data release for the paper Two-Shot SVBRDF Capture for Stationary Materials, by Miika Aittala, Tim Weyrich and Jaakko Lehtinen, in ACM Transactions on Graphics 34(4) (Proc. SIGGRAPH 2015)

License

Copyright (c) 2013-2015 Miika Aittala, Jaakko Lehtinen, Tim Weyrich, Aalto University, University College London.

The code (with the exception of the Simplified Steerable Pyramid, NVIDIA framework, write_pfm.m and read_pfm.m, which are subject to their own licenses) is released under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license (http://creativecommons.org/licenses/by-nc-sa/4.0/).

The data is licensed under a Creative Commons Attribution 3.0 Unported License.

Please include a citation to our paper if you use the data in an academic research paper. For other uses, we ask you to link to the project webpage, and include the text "Material models courtesy of Miika Aittala, Tim Weyrich, and Jaakko Lehtinen".

Download

Source code (Matlab/CUDA/C++)
Dataset: Input data (7zip , ~16 GB)
Dataset: Output texture maps (7zip, ~13 GB)
Dataset for the Oculus Rift viewer: Height maps (zip, ~1.3 GB) (Thanks to Joep Moritz for computing the maps.)

Data

The data is split in two parts: the input data, and the render-ready output. Extract them into the same directory structure (e.g. texturesvbrdf/data/), so that each dataset will end up occupying a single directory of its own.

The contents of this directory are as follows:

Running the full solver will also produce other files, which were left out of the data packages, as they are large (they also store a lot of unnecessary data that is not used anymore):

Source code

The source code package contains two separate programs: the Matlab source code for the method itself, and a simple example Visual C++ project for rendering the results under point light illumination. Matlab image processing and parallel computation toolboxes and a CUDA capable GPU are required.

The code is research code, and hence unfortunately not always particularly readable, flexible or efficient. Its main purpose is to serve as a reference. It is not intended as a production tool, and its proper use requires understanding of the technical details in the paper; please recall also that commercial use of the code is prohibited by the license.

Matlab optimizer

SOLVE_ALL.m runs the entire pipeline for all the datasets (you might want to do something to run multiple solves in parallel in separate Matlab instances, though); follow the function calls to get an idea of where the different parts of the algorithm are implemented and how they are called.

The main steps are executed by tex_compute_transport_cuda.m, tex_alternate.m, and tex_reverse_output.m. These are, respectively, the initial reflectance transport step; the main fitting step (including the flipping between Heeger-Bergen and Levenberg-Marquardt); and the reverse transport and final image output steps.

The CUDA source file cuda_feature.cu should be compiled into ptx with nvcc. The command for this is included in the file as a comment.

Included is also the source code for steerable pyramid decomposition and reconstruction by Eero Simoncelli and Dzung Nguyen (http://www.mathworks.com/matlabcentral/fileexchange/36488-simplified-steerable-pyramid/). Some files were modified to better suit our purposes; these files were renamed with suffix 2. These files must be in your Matlab path when running the solver.

Older versions of Matlab do not recognize the InitDamping parameter for lsqnonlin(), which can cause an error in the optimizer. This setting is useful, because the default initial damping sometimes allows for extremely bold steps in the first iterations, which can corrupt some pixels if they happen to land near a distant and poor local minimum. If you have an old Matlab, you can make a dirty hack around this by copying Matlab's own levenbergMarquardt.m to your source directory, and modifying the four rows containing sqrt(lambda... into sqrt(100*lambda....

Note that the iPhone 5 field of view and resolution are currently hardcoded into the source, in the beginning of tex_alternate.m. You should change these if you use other devices (of course, a better idea would be to modify the fdata struct to include these parameters).

C++ viewer

The viewer is intended as a very simple and rough example of how to use the materials out of the solver. In particular, it contains a GLSL fragment shader source code for the material model. The UI is built on a rendering/UI framework courtesy of NVIDIA Corporation, included in the solution.

The viewer is operated by clicking on the small buttons (hover over a button to see its name) and dragging the sliders on the right edge. Press the Load Textures button to load a dataset, by choosing any of the PFM files. The light source can be translated and adjusted by the relevant sliders on the right. Navigate by dragging with the mouse (the controls are presently a bit unintuitive, due to some quick and dirty coordinate system hacks). Rotate the mouse wheel to change the navigation speed.

Usage

  1. Take a flash and a guide photo of the material. Be sure to read the capture guidelines further down in this document first.

  2. If you didn't use a support like a tripod, you need to align the two photos. (If you did, skip this step.) We provide a simple Matlab tool for this purpose. It takes two image file names as argument. The idea is to warp the first image onto the second image by a homography, using four point correspondences. You can use basic zoom and pan tools to get a closer look at the images. The alignment needs to be close to pixel-perfect, so make sure to go close. Once you're ready, click on the corresponding point in each image to set the first point. Repeat for the remaining three, preferably each one close to a different image corner. After you have placed the four points, close the tool window. The output image will be written to a file by name image1_to_image2.png (with image1 and image2 replaced by your input image names).

    Generally we aligned the guide image onto the flash image, so that the latter stays undistorted. At this point, you may verify that the alignment of the warped image matches that of the target, by flipping back and forth between them.

    Example usage:

    >> align_imgs('d:\img\stuff\IMG_1683.JPG', 'd:\img\stuff\IMG_1682.JPG');
                    

  3. Once you have aligned flash and guide photo, place them in a directory for this dataset, and name them flash.(png or jpg or tiff) and guide.(png or jpg or tiff). The following command will build a dataset based on the two images:

    >> tex_build_data('d:/texture/data/yourdataset/')
                    

    This opens a window for choosing the crop region and the tiling. Maximize it to get a good view. Use the arrow keys to adjust the number of tiles, + and - to adjust the size of the crop region, and mouse clicks to place the upper left corner of the crop region. Any cropping should be done using this tool, so that the information about the changed FOV will be properly saved. For more information about good choices of tiling and crop, read the guidelines below. You can switch between master and guide image by pressing Space (just to get a different visualization if you want). Make sure the crop region stays within the image. Once you're done, press Enter. You can now choose the master tile position by clicking with the mouse. Finally, press Enter. The output is a directory by name out, and within it a file fdata.mat, which contains a Matlab struct with all the relevant images and parameters. If you want to have a look inside it, use the supplied function tex_load_data() to read it.

  4. Run the solver steps. The easiest way to do this is to modify the SOLVE_ALL.m script, e.g. by replacing the path and the default list of datasets by your dataset. Each major step called by SOLVE_ALL saves its result in a separate file, so if you for example want to experiment with the optimizer, you don't need to run the reflectance transport every time. The result is output in PFM format, as described above.

Guidelines for successful capture

While the method tolerates many kinds of input, some general guidelines should be followed for optimal quality. Please have a look at the input flash and guide images in our datasets to get a general idea of how they should look.

In particular, the flash photo should show a clear circular or elliptic highlight somewhere close to the image center (no exact centering is required), as this brings out the surface glossiness and shape variations. In general, the surface properties of interest should be evident in the flash photo by casual visual inspection, so that the algorithm has something to work with. The surface should be such that it can reasonably be modeled by the standard diffuse+specular+heightmap model. Note that e.g. cloth often pushes this limitation and might not be optimally suited for the method. There should be as little large-scale 3D shape (such as wrinkles) as possible, as this can break the clean image of the highlight. Some ambient light is tolerated, but too much may drown out the flash itself.

The guide photo should contain as little large-scale variation as possible. In particular, direct shadows, or reflections of the camera itself taking the picture, should be avoided. The only assumption that the method makes about the guide photo illumination is that it is distant. Hence the ambient lighting environment does not need to be uniform or smooth (like an overcast sky). E.g. a distant oblique point light source can be fine, as long as it doesn't cause a visible large-scale highlight in the guide photo. In fact, it can even be very good in the sense that it can bring out the small detail clearly.

In our data, we used an iPhone 5 and the default camera app, but other similar devices should work as well. Note that the solver is currently hard-coded to assume iPhone 5 resolution and FOV (you can change them in tex_alternate.m). The important thing is that the camera and the flash are as co-located as possible, as the rendering model assumes this. The image should be taken with a wide FOV (i.e. no telephoto, and as little cropping as possible), to get a good coverage of view/light angles.

The shooting distance should be such that the flash image can be split into roughly 15x10 tiles (reasonable choices ranging from about 10x7 to 25x15), in such way that 1) each tile will contain some instances of all the features present in the texture (i.e. any tile should look like a good "summary" of the whole texture also in isolation), and 2) the lighting will be approximately constant within each tile (i.e. no strong "linear gradient" in any tile). Again, have a look at the input photos in our data, and the intermediate visualizations during the solve, to gain some intuition about this. The algorithm is not highly sensitive to the choice of the tiling pitch, but going too far in either direction will likely result in lower fidelity. The choice of the location of the master tile doesn't matter in principle, but you might want to manually choose a clean, representative region that doesn't contain any outliers, and has a good image quality.

The flash and the guide photo must be accurately aligned (though small misalignments of a couple of pixels seem to be tolerated in many of our datasets). This can be ensured by either not moving the camera between the two photos (in practice, using some mechanical support like tripod), or aligning the images in post using point correspondences, as detailed above. Note that it might be difficult to find matching points in densely repeating uniform textures with thousands of near-identical features scattered around the surface. In such cases it might be a good idea to introduce at least four small markers (like pieces of tape) near the very corners of the image, and cropping them out after aligning.

In terms of limitations on specularity, the method might not correctly resolve very glossy highlights (where the highlight in the flash photo is small enough to be comparable to the tile size -- notice how in this case it is difficult to do the tile splitting properly), nor very dull highlights (where the entire highlight does not fit within the flash image, and may be confused with the diffuse component).

Questions, comments

miikadotaittalaataaltodotfi