
Screened Poisson reconstruction code by Tero Karras (Copyright 2015 NVIDIA Corporation), slightly modified 
by Markus Kettunen and Marco Manzi for inclusion into the source code release of Gradient-Domain Path Tracing 
and Gradient-Domain Bidirectional Path Tracing by Markus Kettunen and Marco Manzi. 

For inclusion into the source code release, compile the source as a static library and include it into the 
Mitsuba renderer dependencies (the default VC++ project settings create a static library). The code is still 
functional as command line stand-alone if compiled as application. Instruction about how to use the stand-alone 
are given below in the original readme:


Poisson solver v1.0
-------------------
Implementation by Tero Karras (tkarras@nvidia.com)
Copyright 2015 NVIDIA Corporation

Reconstruct an image from its gradients by solving the screened Poisson equation.
Tested on 64-bit Windows 7, Xeon E5-2670 (20 cores), CUDA 6.5, and GeForce GTX 980 (compute capability 5.2).

Example:
  > poisson-Win32-Release.exe -dx scenes/bathroom-dx.pfm -alpha 0.2 -brightness 2
  Using CUDA device 0: GeForce GTX 980
  Execution time = 0.89 s
  L1 error = 0.0190704
  L2 error = 0.00301321
  PSNR vs. reference = 9.88 dB

Usage: poisson.exe [OPTIONS]

Input images in PFM format:
  -dx         <PFM>  Noisy horizontal gradient image. Typically '<BASE>-dx.pfm'.
  -dy         <PFM>  Noisy vertical gradient image. Default is '<BASE>-dy.pfm' based on -dx.
  -throughput <PFM>  Noisy throughput image. Default is '<BASE>-throughput.pfm' based on -dx.
  -direct     <PFM>  Direct light image. Default is '<BASE>-direct.pfm' based on -dx.
  -reference  <PFM>  Reference image for PSNR. Default is '<BASE>-reference.pfm' based on -dx.
  -alpha      <0.2>  How much weight to put on the throughput image compared to the gradients.

Output images in PFM format:
  -indirect   <PFM>  Solved indirect light image. Default is '<BASE>-indirect.pfm' based on -dx.
  -final      <PFM>  Direct plus indirect. Default is '<BASE>-final.pfm' based on -dx.
  -noindirect        Do not output indirect light image.
  -nofinal           Do not output final image.
  -nopfm             Do not output any PFM images.

PNG conversion:
  -brightness 1.0    Scale image intensity before converting to sRGB color space.
  -pngin             Convert all input images to PNG. By default, this is done for PNG files that do not exist.
  -nopngin           Do not convert input images to PNG.
  -pngout            Convert all output images to PNG. This is the default.
  -nopngout          Do not convert output images to PNG.
  -nopng             Do not output any PNG images.

Other options:
  -backend  CUDA     Enable GPU acceleration using CUDA. Requires a GPU with compute capability 3.0 or higher.
  -backend  OpenMP   Enable multicore acceleration using OpenMP.
  -backend  Naive    Use naive single-threaded CPU implementation.
  -backend  Auto     Use 'CUDA' if available, or fall back to 'OpenMP' if not. This is the default.
  -device   0        Choose the CUDA device to use. Only applicable to 'CUDA' and 'Auto'.
  -verbose, -v       Enable verbose printouts.
  -display, -d       Display progressive image refinement during the solver.
  -help,    -h       Display this help text.

Solver presets (default is L1D):
  -config  L1D      L1 default config: ~1s for 1280x720 on GTX980, L1 error lower than MATLAB reference.
  -config  L1Q      L1 high-quality config: ~50s for 1280x720 on GTX980, L1 error as low as possible.
  -config  L1L      L1 legacy config: ~89s for 1280x720 on GTX980, L1 error equal to MATLAB reference.
  -config  L2D      L2 default config: ~0.1s for 1280x720 on GTX980, L2 error equal to MATLAB reference.
  -config  L2Q      L2 high-quality config: ~0.5s for 1280x720 on GTX980, L2 error as low as possible.

Solver configuration:
  -irlsIterMax 20   Number of iteratively reweighted least squares (IRLS) iterations.
  -irlsRegInit 0.05 Initial value of the IRLS regularization parameter.
  -irlsRegIter 0.5  Multiplier for the IRLS regularization parameter on subsequent iterations.
  -cgIterMax   50   Maximum number of conjugate gradient (CG) iterations per IRLS iteration.
  -cgIterCheck 100  Check status every N iterations (incl. early exit, CPU-GPU sync, printouts, image display).
  -cgPrecond   0    0 = regular conjugate gradient (optimized), 1 = preconditioned conjugate gradient (experimental).
  -cgTolerance 0    Stop CG iteration when the weight L2 error  (errL2
