diff --git a/README.md b/README.md
index ae6088d..0aff116 100644
--- a/README.md
+++ b/README.md
@@ -3,271 +3,35 @@ CIS 565 Project3 : CUDA Pathtracer
 
 Fall 2014
 
-Due Wed, 10/8 (submit without penalty until Sun, 10/12)
-
-## INTRODUCTION
-In this project, you will implement a CUDA based pathtracer capable of
-generating pathtraced rendered images extremely quickly. Building a pathtracer can be viewed as a generalization of building a raytracer, so for those of you who have taken 460/560, the basic concept should not be very new to you. For those of you that have not taken
-CIS460/560, raytracing is a technique for generating images by tracing rays of
-light through pixels in an image plane out into a scene and following the way
-the rays of light bounce and interact with objects in the scene. More
-information can be found here:
-http://en.wikipedia.org/wiki/Ray_tracing_(graphics). Pathtracing is a generalization of this technique by considering more than just the contribution of direct lighting to a surface.
-
-Since in this class we are concerned with working in generating actual images
-and less so with mundane tasks like file I/O, this project includes basecode
-for loading a scene description file format, described below, and various other
-things that generally make up the render "harness" that takes care of
-everything up to the rendering itself. The core renderer is left for you to
-implement.  Finally, note that while this basecode is meant to serve as a
-strong starting point for a CUDA pathtracer, you are not required to use this
-basecode if you wish, and you may also change any part of the basecode
-specification as you please, so long as the final rendered result is correct.
-
-## CONTENTS
-The Project3 root directory contains the following subdirectories:
-	
-* src/ contains the source code for the project. Both the Windows Visual Studio
-  solution and the OSX and Linux makefiles reference this folder for all 
-  source; the base source code compiles on Linux, OSX and Windows without 
-  modification.  If you are building on OSX, be sure to uncomment lines 4 & 5 of
-  the CMakeLists.txt in order to make sure CMake builds against clang.
-* data/scenes/ contains an example scene description file.
-* renders/ contains an example render of the given example scene file. 
-* windows/ contains a Windows Visual Studio 2010 project and all dependencies
-  needed for building and running on Windows 7. If you would like to create a
-  Visual Studio 2012 or 2013 projects, there are static libraries that you can
-  use for GLFW that are in external/bin/GLFW (Visual Studio 2012 uses msvc110, 
-  and Visual Studio 2013 uses msvc120)
-* external/ contains all the header, static libraries and built binaries for
-  3rd party libraries (i.e. glm, GLEW, GLFW) that we use for windowing and OpenGL
-  extensions
-
-## RUNNING THE CODE
-The main function requires a scene description file (that is provided in data/scenes). 
-The main function reads in the scene file by an argument as such :
-'scene=[sceneFileName]'
-
-If you are using Visual Studio, you can set this in the Debugging > Command Arguments section
-in the Project properties.
-
-## REQUIREMENTS
-In this project, you are given code for:
-
-* Loading, reading, and storing the scene scene description format
-* Example functions that can run on both the CPU and GPU for generating random
-  numbers, spherical intersection testing, and surface point sampling on cubes
-* A class for handling image operations and saving images
-* Working code for CUDA-GL interop
-
-You will need to implement the following features:
-
-* Raycasting from a camera into a scene through a pixel grid
-* Diffuse surfaces
-* Perfect specular reflective surfaces
-* Cube intersection testing
-* Sphere surface point sampling
-* Stream compaction optimization
-
-You are also required to implement at least 2 of the following features:
-
-* Texture mapping 
-* Bump mapping
-* Depth of field
-* Refraction, i.e. glass
-* OBJ Mesh loading and rendering
-* Interactive camera
-* Motion blur
-* Subsurface scattering
-
-The 'extra features' list is not comprehensive.  If you have a particular feature
-you would like to implement (e.g. acceleration structures, etc.) please contact us 
-first!
-
-For each 'extra feature' you must provide the following analysis :
-* overview write up of the feature
-* performance impact of the feature
-* if you did something to accelerate the feature, why did you do what you did
-* compare your GPU version to a CPU version of this feature (you do NOT need to 
-  implement a CPU version)
-* how can this feature be further optimized (again, not necessary to implement it, but
-  should give a roadmap of how to further optimize and why you believe this is the next
-  step)
-
-## BASE CODE TOUR
-You will be working in three files: raytraceKernel.cu, intersections.h, and
-interactions.h. Within these files, areas that you need to complete are marked
-with a TODO comment. Areas that are useful to and serve as hints for optional
-features are marked with TODO (Optional). Functions that are useful for
-reference are marked with the comment LOOK.
-
-* raytraceKernel.cu contains the core raytracing CUDA kernel. You will need to
-  complete:
-    * cudaRaytraceCore() handles kernel launches and memory management; this
-      function already contains example code for launching kernels,
-      transferring geometry and cameras from the host to the device, and transferring
-      image buffers from the host to the device and back. You will have to complete
-      this function to support passing materials and lights to CUDA.
-    * raycastFromCameraKernel() is a function that you need to implement. This
-      function once correctly implemented should handle camera raycasting. 
-    * raytraceRay() is the core raytracing CUDA kernel; all of your pathtracing
-      logic should be implemented in this CUDA kernel. raytraceRay() should
-      take in a camera, image buffer, geometry, materials, and lights, and should
-      trace a ray through the scene and write the resultant color to a pixel in the
-      image buffer.
-
-* intersections.h contains functions for geometry intersection testing and
-  point generation. You will need to complete:
-    * boxIntersectionTest(), which takes in a box and a ray and performs an
-      intersection test. This function should work in the same way as
-      sphereIntersectionTest().
-    * getRandomPointOnSphere(), which takes in a sphere and returns a random
-      point on the surface of the sphere with an even probability distribution.
-      This function should work in the same way as getRandomPointOnCube(). You can
-      (although do not necessarily have to) use this to generate points on a sphere
-      to use a point lights, or can use this for area lighting.
-
-* interactions.h contains functions for ray-object interactions that define how
-  rays behave upon hitting materials and objects. You will need to complete:
-    * getRandomDirectionInSphere(), which generates a random direction in a
-      sphere with a uniform probability. This function works in a fashion
-      similar to that of calculateRandomDirectionInHemisphere(), which generates a
-      random cosine-weighted direction in a hemisphere.
-    * calculateBSDF(), which takes in an incoming ray, normal, material, and
-      other information, and returns an outgoing ray. You can either implement
-      this function for ray-surface interactions, or you can replace it with your own
-      function(s).
-
-You will also want to familiarize yourself with:
-
-* sceneStructs.h, which contains definitions for how geometry, materials,
-  lights, cameras, and animation frames are stored in the renderer. 
-* utilities.h, which serves as a kitchen-sink of useful functions
-
-## NOTES ON GLM
-This project uses GLM, the GL Math library, for linear algebra. You need to
-know two important points on how GLM is used in this project:
-
-* In this project, indices in GLM vectors (such as vec3, vec4), are accessed
-  via swizzling. So, instead of v[0], v.x is used, and instead of v[1], v.y is
-  used, and so on and so forth.
-* GLM Matrix operations work fine on NVIDIA Fermi cards and later, but
-  pre-Fermi cards do not play nice with GLM matrices. As such, in this project,
-  GLM matrices are replaced with a custom matrix struct, called a cudaMat4, found
-  in cudaMat4.h. A custom function for multiplying glm::vec4s and cudaMat4s is
-  provided as multiplyMV() in intersections.h.
-
-## SCENE FORMAT
-This project uses a custom scene description format.
-Scene files are flat text files that describe all geometry, materials,
-lights, cameras, render settings, and animation frames inside of the scene.
-Items in the format are delimited by new lines, and comments can be added at
-the end of each line preceded with a double-slash.
-
-Materials are defined in the following fashion:
-
-* MATERIAL (material ID)								//material header
-* RGB (float r) (float g) (float b)					//diffuse color
-* SPECX (float specx)									//specular exponent
-* SPECRGB (float r) (float g) (float b)				//specular color
-* REFL (bool refl)									//reflectivity flag, 0 for
-  no, 1 for yes
-* REFR (bool refr)									//refractivity flag, 0 for
-  no, 1 for yes
-* REFRIOR (float ior)									//index of refraction
-  for Fresnel effects
-* SCATTER (float scatter)								//scatter flag, 0 for
-  no, 1 for yes
-* ABSCOEFF (float r) (float b) (float g)				//absorption
-  coefficient for scattering
-* RSCTCOEFF (float rsctcoeff)							//reduced scattering
-  coefficient
-* EMITTANCE (float emittance)							//the emittance of the
-  material. Anything >0 makes the material a light source.
-
-Cameras are defined in the following fashion:
-
-* CAMERA 												//camera header
-* RES (float x) (float y)								//resolution
-* FOVY (float fovy)										//vertical field of
-  view half-angle. the horizonal angle is calculated from this and the
-  reslution
-* ITERATIONS (float interations)							//how many
-  iterations to refine the image, only relevant for supersampled antialiasing,
-  depth of field, area lights, and other distributed raytracing applications
-* FILE (string filename)									//file to output
-  render to upon completion
-* frame (frame number)									//start of a frame
-* EYE (float x) (float y) (float z)						//camera's position in
-  worldspace
-* VIEW (float x) (float y) (float z)						//camera's view
-  direction
-* UP (float x) (float y) (float z)						//camera's up vector
-
-Objects are defined in the following fashion:
-* OBJECT (object ID)										//object header
-* (cube OR sphere OR mesh)								//type of object, can
-  be either "cube", "sphere", or "mesh". Note that cubes and spheres are unit
-  sized and centered at the origin.
-* material (material ID)									//material to
-  assign this object
-* frame (frame number)									//start of a frame
-* TRANS (float transx) (float transy) (float transz)		//translation
-* ROTAT (float rotationx) (float rotationy) (float rotationz)		//rotation
-* SCALE (float scalex) (float scaley) (float scalez)		//scale
-
-An example scene file setting up two frames inside of a Cornell Box can be
-found in the scenes/ directory.
-
-For meshes, note that the base code will only read in .obj files. For more 
-information on the .obj specification see http://en.wikipedia.org/wiki/Wavefront_.obj_file.
-
-An example of a mesh object is as follows:
-
-OBJECT 0
-mesh tetra.obj
-material 0
-frame 0
-TRANS       0 5 -5
-ROTAT       0 90 0
-SCALE       .01 10 10 
-
-Check the Google group for some sample .obj files of varying complexity.
-
-## THIRD PARTY CODE POLICY
-* Use of any third-party code must be approved by asking on our Google Group.  
-  If it is approved, all students are welcome to use it.  Generally, we approve 
-  use of third-party code that is not a core part of the project.  For example, 
-  for the ray tracer, we would approve using a third-party library for loading 
-  models, but would not approve copying and pasting a CUDA function for doing 
-  refraction.
-* Third-party code must be credited in README.md.
-* Using third-party code without its approval, including using another
-  student's code, is an academic integrity violation, and will result in you
-  receiving an F for the semester.
-
-## SELF-GRADING
-* On the submission date, email your grade, on a scale of 0 to 100, to Harmony,
-  harmoli+cis565@seas.upenn.com, with a one paragraph explanation.  Be concise and
-  realistic.  Recall that we reserve 30 points as a sanity check to adjust your
-  grade.  Your actual grade will be (0.7 * your grade) + (0.3 * our grade).  We
-  hope to only use this in extreme cases when your grade does not realistically
-  reflect your work - it is either too high or too low.  In most cases, we plan
-  to give you the exact grade you suggest.
-* Projects are not weighted evenly, e.g., Project 0 doesn't count as much as
-  the path tracer.  We will determine the weighting at the end of the semester
-  based on the size of each project.
-
-## SUBMISSION
-Please change the README to reflect the answers to the questions we have posed
-above.  Remember:
-* this is a renderer, so include images that you've made!
-* be sure to back your claims for optimization with numbers and comparisons
-* if you reference any other material, please provide a link to it
-* you wil not e graded on how fast your path tracer runs, but getting close to
-  real-time is always nice
-* if you have a fast GPU renderer, it is good to show case this with a video to
-  show interactivity.  If you do so, please include a link.
-
-Be sure to open a pull request and to send Harmony your grade and why you
-believe this is the grade you should get.
+Features:
+
+Raycasting from a camera into a scene through a pixel grid
+Diffuse surfaces
+Perfect specular reflective surfaces
+Cube intersection testing
+Sphere surface point sampling
+Stream compaction optimization
+
+Additional features:
+Depth of field
+![ScreenShot](https://raw.githubusercontent.com/CyborgYL/Project3-Pathtracer/master/result/dof.PNG)
+Calculating the ray jittering according to object position and camera position to make the image look like taken from a real camera with a large aperture.
+This is a simple calculation run one time per ray. It only adds up several steps of single instructions so there's no noticable change on performance.
+Time complexity GPU O(1), CPU O(n)
+
+Refraction, i.e. glass
+![ScreenShot](https://raw.githubusercontent.com/CyborgYL/Project3-Pathtracer/master/result/pathtracer.PNG)
+Calculate the direction of refracted ray (glass effect).
+No impact on performance because it changes the direction of a ray instead of shooting new rays. And there is a random number deciding to use reflection or refraction.
+Time complexity GPU O(1), CPU O(n)
+
+Motion blur
+![ScreenShot](https://raw.githubusercontent.com/CyborgYL/Project3-Pathtracer/master/result/motion.PNG)
+Altering object's position while rendering a single image
+No impact on performance. It simply changes the geometry position before each iteration.
+Time complexity GPU O(1), CPU O(1)
+
+Stream Compaction
+One iteration time measuring (time in ms)
+![ScreenShot](https://raw.githubusercontent.com/CyborgYL/Project3-Pathtracer/master/result/performance.PNG)
+Because I'm using a  GK110 Kepler architecture GPU, I put the loop inside __global__ function instead of calling the kernel in a loop at the host program. On the other side, stream compaction needs to call kernel from host every ray so there is huge expense in doing that for each pixel and each ray.
diff --git a/result/dof.PNG b/result/dof.PNG
new file mode 100644
index 0000000..2a00811
Binary files /dev/null and b/result/dof.PNG differ
diff --git a/result/motion.PNG b/result/motion.PNG
new file mode 100644
index 0000000..a1d0845
Binary files /dev/null and b/result/motion.PNG differ
diff --git a/result/pathtracer.PNG b/result/pathtracer.PNG
new file mode 100644
index 0000000..dddb6c0
Binary files /dev/null and b/result/pathtracer.PNG differ
diff --git a/result/performance.PNG b/result/performance.PNG
new file mode 100644
index 0000000..3062cdb
Binary files /dev/null and b/result/performance.PNG differ
diff --git a/src/interactions.h b/src/interactions.h
index 7bf6fab..cc93f4b 100644
--- a/src/interactions.h
+++ b/src/interactions.h
@@ -40,13 +40,22 @@ __host__ __device__ bool calculateScatterAndAbsorption(ray& r, float& depth, Abs
 
 // TODO (OPTIONAL): IMPLEMENT THIS FUNCTION
 __host__ __device__ glm::vec3 calculateTransmissionDirection(glm::vec3 normal, glm::vec3 incident, float incidentIOR, float transmittedIOR) {
-  return glm::vec3(0,0,0);
+	float n12 = incidentIOR / transmittedIOR;
+	float temp = 1 - n12* n12 * (1 - pow(glm::dot(normal, incident),2));
+
+	if (temp >= 0.0f){
+		return glm::normalize((-n12 * glm::dot(normal, incident) - sqrt(temp)) * normal + n12 * incident);
+	}
+	else
+	{
+		return calculateReflectionDirection(normal, incident);
+	}
 }
 
 // TODO (OPTIONAL): IMPLEMENT THIS FUNCTION
 __host__ __device__ glm::vec3 calculateReflectionDirection(glm::vec3 normal, glm::vec3 incident) {
   //nothing fancy here
-  return glm::vec3(0,0,0);
+	return glm::normalize(incident - 2.0f * normal * (glm::dot(incident, normal)));
 }
 
 // TODO (OPTIONAL): IMPLEMENT THIS FUNCTION
@@ -55,6 +64,7 @@ __host__ __device__ Fresnel calculateFresnel(glm::vec3 normal, glm::vec3 inciden
 
   fresnel.reflectionCoefficient = 1;
   fresnel.transmissionCoefficient = 0;
+ 
   return fresnel;
 }
 
@@ -91,7 +101,11 @@ __host__ __device__ glm::vec3 calculateRandomDirectionInHemisphere(glm::vec3 nor
 // non-cosine (uniform) weighted random direction generation.
 // This should be much easier than if you had to implement calculateRandomDirectionInHemisphere.
 __host__ __device__ glm::vec3 getRandomDirectionInSphere(float xi1, float xi2) {
-  return glm::vec3(0,0,0);
+
+	float theta = 2 * TWO_PI * xi1;
+	float phi = acos(2 * xi2 - 1);
+	float x = cos(phi);
+	return glm::vec3(sqrt(1 - x*x) * cos(theta), sqrt(1 - x*x) * sin(theta), x);
 }
 
 // TODO (PARTIALLY OPTIONAL): IMPLEMENT THIS FUNCTION
diff --git a/src/intersections.h b/src/intersections.h
index c9eafb6..f738582 100644
--- a/src/intersections.h
+++ b/src/intersections.h
@@ -13,6 +13,7 @@
 #include "cudaMat4.h"
 #include "utilities.h"
 
+#define PI 3.141592657
 // Some forward declarations
 __host__ __device__ glm::vec3 getPointOnRay(ray r, float t);
 __host__ __device__ glm::vec3 multiplyMV(cudaMat4 m, glm::vec4 v);
@@ -71,9 +72,108 @@ __host__ __device__ glm::vec3 getSignOfRay(ray r){
 
 // TODO: IMPLEMENT THIS FUNCTION
 // Cube intersection test, return -1 if no intersection, otherwise, distance to intersection
+__host__ __device__ void swap(float &f1, float &f2)
+{
+	float temp;
+	temp = f2;
+	f2 = f1;
+	f1 = temp;
+}
+
 __host__ __device__ float boxIntersectionTest(staticGeom box, ray r, glm::vec3& intersectionPoint, glm::vec3& normal){
+	
+	float radius = .5;
 
-    return -1;
+	glm::vec3 ro = multiplyMV(box.inverseTransform, glm::vec4(r.origin, 1.0f));
+	glm::vec3 rd = glm::normalize(multiplyMV(box.inverseTransform, glm::vec4(r.direction, 0.0f)));
+
+	ray rt; rt.origin = ro; rt.direction = rd;
+	float Tnear = FLT_MIN, Tfar = FLT_MAX;
+	//X plane
+	if (rd.x == 0)
+	{
+		if (ro.x < -radius || ro.x > radius) 
+			return  -1;
+	}
+	float T1 = (-radius - ro.x) / rd.x;
+	float T2 = (radius - ro.x) / rd.x;
+	if (T1 > T2)
+		swap(T1, T2);
+	if (T1 > Tnear)
+		Tnear = T1;
+	if (T2 < Tfar)
+		Tfar = T2;
+	if (Tnear > Tfar)
+		return -1;
+	if (Tfar < 0)
+		return -1;
+	//Y plane
+	if (rd.y == 0)
+	{
+		if (ro.y < -radius || ro.y > radius)
+			return  -1;
+	}
+	T1 = (-radius - ro.y) / rd.y;
+	T2 = (radius - ro.y) / rd.y;
+	if (T1 > T2)
+		swap(T1, T2);
+	if (T1 > Tnear)
+		Tnear = T1;
+	if (T2 < Tfar)
+		Tfar = T2;
+	if (Tnear > Tfar)
+		return -1;
+	if (Tfar < 0)
+		return -1;
+	//Z plane
+	if (rd.z == 0)
+	{
+		if (ro.z < -radius || ro.z > radius)
+			return  -1;
+	}
+	T1 = (-radius - ro.z) / rd.z;
+	T2 = (radius - ro.z) / rd.z;
+	if (T1 > T2)
+		swap(T1, T2);
+	if (T1 > Tnear)
+		Tnear = T1;
+	if (T2 < Tfar)
+		Tfar = T2;
+	if (Tnear > Tfar)
+		return -1;
+	if (Tfar < 0)
+		return -1;
+	//************loop completed***********
+	glm::vec3 localPoint = getPointOnRay(rt, Tnear);
+	glm::vec3 realIntersectionPoint = multiplyMV(box.transform, glm::vec4(localPoint, 1.0));
+	glm::vec3 realOrigin = multiplyMV(box.transform, glm::vec4(0, 0, 0, 1));
+
+	intersectionPoint = realIntersectionPoint;
+	//calculate normal
+	glm::vec3 localNormal;
+	float eps = 0.001f;
+	if (abs(localPoint.x - 0.5) < eps)
+		localNormal = glm::vec3(1, 0, 0);
+	else if (abs(localPoint.x + 0.5) < eps){
+		localNormal = glm::vec3(-1, 0, 0);
+	}
+	else if (abs(localPoint.y - 0.5) < eps){
+		localNormal = glm::vec3(0, 1, 0);
+	}
+	else if (abs(localPoint.y + 0.5) < eps){
+		localNormal = glm::vec3(0, -1, 0);
+	}
+	else if (abs(localPoint.z - 0.5) < eps){
+		localNormal = glm::vec3(0, 0, 1);
+	}
+	else if (abs(localPoint.z + 0.5) < eps){
+		localNormal = glm::vec3(0, 0, -1);
+	}
+
+	glm::vec3 realNormal = multiplyMV(box.transform, glm::vec4(localNormal, 1.0f));
+	normal = glm::normalize(realNormal - realOrigin);
+	return glm::length(r.origin - realIntersectionPoint);
+	
 }
 
 // LOOK: Here's an intersection test example from a sphere. Now you just need to figure out cube and, optionally, triangle.
@@ -177,8 +277,13 @@ __host__ __device__ glm::vec3 getRandomPointOnCube(staticGeom cube, float random
 // TODO: IMPLEMENT THIS FUNCTION
 // Generates a random point on a given sphere
 __host__ __device__ glm::vec3 getRandomPointOnSphere(staticGeom sphere, float randomSeed){
-
-  return glm::vec3(0,0,0);
+	thrust::default_random_engine rng(hash(randomSeed));
+	thrust::uniform_real_distribution<float> u01(0, 1);
+	thrust::uniform_real_distribution<float> u02(0, 2*PI);
+	float theta = acos(2*(float)u01(rng)-1);
+	float phi = (float)u02(rng);
+	glm::vec3 point(sin(theta)*cos(phi), sin(theta) * sin(phi), cos(theta));
+	return point+sphere.translation;
 }
 
 #endif
diff --git a/src/main.cpp b/src/main.cpp
index b002500..3fd18a1 100755
--- a/src/main.cpp
+++ b/src/main.cpp
@@ -7,6 +7,7 @@
 
 #include "main.h"
 #define GLEW_STATIC
+//#define STREAM_COMPACTION
 
 //-------------------------------
 //-------------MAIN--------------
@@ -106,8 +107,12 @@ void runCuda(){
     }
   
     // execute the kernel
-    cudaRaytraceCore(dptr, renderCam, targetFrame, iterations, materials, renderScene->materials.size(), geoms, renderScene->objects.size() );
-    
+#ifdef STREAM_COMPACTION
+	cudaRaytraceCoreSC(dptr, renderCam, targetFrame, iterations, materials, renderScene->materials.size(), geoms, renderScene->objects.size());
+#else
+	cudaRaytraceCore(dptr, renderCam, targetFrame, iterations, materials, renderScene->materials.size(), geoms, renderScene->objects.size());
+#endif
+	
     // unmap buffer object
     cudaGLUnmapBufferObject(pbo);
   } else {
diff --git a/src/raytraceKernel.cu b/src/raytraceKernel.cu
index 9c7bc7d..f38abe2 100644
--- a/src/raytraceKernel.cu
+++ b/src/raytraceKernel.cu
@@ -4,10 +4,16 @@
 //       Rob Farber for CUDA-GL interop, from CUDA Supercomputing For The Masses: http://www.drdobbs.com/architecture-and-design/cuda-supercomputing-for-the-masses-part/222600097
 //       Peter Kutz and Yining Karl Li's GPU Pathtracer: http://gpupathtracer.blogspot.com/
 //       Yining Karl Li's TAKUA Render, a massively parallel pathtracing renderer: http://www.yiningkarlli.com
+#include <thrust/device_vector.h>
+#include <thrust/host_vector.h>
+#include <thrust/remove.h>
+#include <thrust/copy.h>
 
 #include <stdio.h>
 #include <cuda.h>
 #include <cmath>
+#include <time.h>  
+#include <math.h> 
 
 #include "sceneStructs.h"
 #include "glm/glm.hpp"
@@ -16,6 +22,11 @@
 #include "intersections.h"
 #include "interactions.h"
 
+#define RAY_DEPTH 5
+#define TILE_SIZE 8
+//#define DOF 11.2
+//#define MOTION_BLUR 1
+
 void checkCUDAError(const char *msg) {
   cudaError_t err = cudaGetLastError();
   if( cudaSuccess != err) {
@@ -23,7 +34,13 @@ void checkCUDAError(const char *msg) {
     exit(EXIT_FAILURE); 
   }
 } 
-
+struct  pixelRayUnit
+{
+	ray pixelRay;
+	int index,x,y;
+	bool isFinished;
+	bool currentDepth;
+};
 // LOOK: This function demonstrates how to use thrust for random number generation on the GPU!
 // Function that generates static.
 __host__ __device__ glm::vec3 generateRandomNumberFromThread(glm::vec2 resolution, float time, int x, int y){
@@ -39,8 +56,17 @@ __host__ __device__ glm::vec3 generateRandomNumberFromThread(glm::vec2 resolutio
 // Function that does the initial raycast from the camera
 __host__ __device__ ray raycastFromCameraKernel(glm::vec2 resolution, float time, int x, int y, glm::vec3 eye, glm::vec3 view, glm::vec3 up, glm::vec2 fov){
   ray r;
-  r.origin = glm::vec3(0,0,0);
-  r.direction = glm::vec3(0,0,-1);
+  r.origin = eye;
+  r.direction = view;
+  //suppose the distance from eye to screen is 1
+  float fovRad = fov.y / 180 * PI;
+  float pixelHeight = view.length() * tan(fovRad /2) / (resolution.y /2);
+  glm::vec2 screenCenterIdx(resolution.x / 2.0f, resolution.y / 2.0f);
+  glm::vec3 screenCenter = getPointOnRay(r, 1);
+  glm::vec3 right = glm::cross(view, up);
+  glm::vec2 screenPointIdx((float)x - screenCenterIdx.x,(float)y - screenCenterIdx.y);
+  glm::vec3 screenPoint = screenPointIdx.x * pixelHeight *right - screenPointIdx.y * pixelHeight * up + screenCenter;
+  r.direction = glm::normalize(screenPoint - eye);
   return r;
 }
 
@@ -87,30 +113,156 @@ __global__ void sendImageToPBO(uchar4* PBOpos, glm::vec2 resolution, glm::vec3*
       PBOpos[index].z = color.z;
   }
 }
+__device__ float intersectionTest(staticGeom *geoms, int numberOfGeoms, ray currentRay, staticGeom *&intersectionGeom, glm::vec3 &intersectionPoint, glm::vec3 &intersectionNormal){
+	float Tnear = FLT_MAX;
+	for (size_t i = 0; i < numberOfGeoms; i++)
+	{
+		glm::vec3 tempPoint, tempNormal;
+		float distance;
+		if (geoms[i].type == CUBE)
+		{
+			distance = boxIntersectionTest(geoms[i], currentRay, tempPoint, tempNormal);
+		}
+		else if (geoms[i].type == SPHERE)
+		{
+			distance = sphereIntersectionTest(geoms[i], currentRay, tempPoint, tempNormal);
+		}
+		if (distance < Tnear && distance != -1.0f)
+		{
+			Tnear = distance;
+			intersectionPoint = tempPoint;
+			intersectionNormal = tempNormal;
+			intersectionGeom = &geoms[i];
+		}
+	}
+	return Tnear;
+}
 
 // TODO: IMPLEMENT THIS FUNCTION
 // Core raytracer kernel
 __global__ void raytraceRay(glm::vec2 resolution, float time, cameraData cam, int rayDepth, glm::vec3* colors,
-                            staticGeom* geoms, int numberOfGeoms){
+                            staticGeom* geoms, int numberOfGeoms,material* materials, int numberOfMaterials){
 
   int x = (blockIdx.x * blockDim.x) + threadIdx.x;
   int y = (blockIdx.y * blockDim.y) + threadIdx.y;
   int index = x + (y * resolution.x);
+  int arraySize = resolution.x *resolution.y;
+  int bounceCount = 0;
+  ray currentRay;
+  glm::vec3  color(1,1,1);
+  currentRay = raycastFromCameraKernel(resolution, time, x, y, cam.position, cam.view, cam.up, cam.fov);
+#ifdef DOF
+  float focalLength = DOF;
+  glm::vec3 aimingPosition = currentRay.origin + focalLength * currentRay.direction;
+  glm::vec3 rand = generateRandomNumberFromThread(cam.resolution, time * 3, x, y);
+  glm::vec3 camPosition = glm::vec3(cam.position.x + (float)rand.x, cam.position.y + (float)rand.y, cam.position.z + (float)rand.z);
+  currentRay.origin = camPosition;
+  currentRay.direction = glm::normalize(aimingPosition - camPosition);
+#endif
+  if((x<resolution.x && y<resolution.y))
+  {
+	  while (++bounceCount < rayDepth)
+	  {
+		  staticGeom *bounceGeom = NULL;
+		  glm::vec3 bouncePoint, bounceNormal;
+		  intersectionTest(geoms, numberOfGeoms, currentRay, bounceGeom, bouncePoint, bounceNormal);
+		  glm::vec3 rnd = generateRandomNumberFromThread(cam.resolution, time + (bounceCount + 1), x, y);
+		 
+		  if (bounceGeom != NULL)
+		  {
+			
+			  //calculate diffuse color
+			  if (materials[bounceGeom->materialid].emittance > 0)	//light
+			  {
+				  color *= materials[bounceGeom->materialid].emittance * materials[bounceGeom->materialid].color;
+				  break;
+			  }
+			  if (bounceCount == rayDepth -1)
+			  {
+				  color *= 0;
+				  break;
+			  }
+
+			  glm::vec3 nextRayDir = calculateRandomDirectionInHemisphere(bounceNormal,rnd.x,rnd.y);
 
-  if((x<=resolution.x && y<=resolution.y)){
+			  glm::vec3 H = glm::normalize(nextRayDir - currentRay.direction);
+			  float NdotH = glm::dot(H, bounceNormal);
+			  if (materials[bounceGeom->materialid].hasRefractive <= 0)
+			  {
+				 
+				  color *= materials[bounceGeom->materialid].color;
+			  }
+			  	  //*************has reflection********************
+			  if (materials[bounceGeom->materialid].hasReflective && rnd.z < materials[bounceGeom->materialid].hasReflective)
+			  {
+				  Fresnel fresnel;
+				  glm::vec3 reflectionDir = calculateReflectionDirection(bounceNormal, currentRay.direction);
+				  float io = glm::dot(currentRay.direction, bounceNormal);
+				  currentRay.origin = bouncePoint + 0.001f * reflectionDir;
+				  currentRay.direction = reflectionDir;
 
-    colors[index] = generateRandomNumberFromThread(resolution, time, x, y);
+				  continue;
+			  }
+			  //********************Refractive***********************
+			  else if (materials[bounceGeom->materialid].hasRefractive)
+			  {
+				  Fresnel fresnel;
+				  glm::vec3 refractionDir;
+				  float io = glm::dot(currentRay.direction, bounceNormal);
+				
+				  if (rnd.z < 0.9f)
+				  {
+					  if (io < 0)
+					  {
+						  refractionDir = calculateTransmissionDirection(bounceNormal, currentRay.direction, 1.0f, materials[bounceGeom->materialid].indexOfRefraction);
+					  }
+					  else
+					  {
+						  refractionDir = calculateTransmissionDirection(-bounceNormal, currentRay.direction, materials[bounceGeom->materialid].indexOfRefraction, 1.0);
+					  }
+					  currentRay.origin = bouncePoint + 0.001f * refractionDir;
+					  currentRay.direction = refractionDir;
+					
+					  continue;
+				  }
+				  else
+				  {
+					  glm::vec3 reflectionDir = calculateReflectionDirection(bounceNormal, currentRay.direction);
+					  currentRay.direction = reflectionDir;
+					  currentRay.origin = bouncePoint + 0.001f * reflectionDir;
+					 
+					  continue;
+				  }
+			  }
+			  //**********************Diffuse*********************
+			  else
+			  {
+				  currentRay.direction = nextRayDir;
+				  currentRay.origin = bouncePoint + 0.001f * nextRayDir;
+			  }
+		
+		  }
+		  else
+		  {
+			  color *= 0;
+			  break;
+		  } 
+	  }
+
+	  colors[index] = colors[index] / (time + 1)*time  + color / (time + 1);
+	 //colors[index] = color;
    }
 }
 
 // TODO: FINISH THIS FUNCTION
 // Wrapper for the __global__ call that sets up the kernel calls and does a ton of memory management
 void cudaRaytraceCore(uchar4* PBOpos, camera* renderCam, int frame, int iterations, material* materials, int numberOfMaterials, geom* geoms, int numberOfGeoms){
-  
-  int traceDepth = 1; //determines how many bounces the raytracer traces
+	clock_t  clockBegin, clockEnd;
+	clockBegin = clock();
+  int traceDepth = RAY_DEPTH; //determines how many bounces the raytracer traces
 
   // set up crucial magic
-  int tileSize = 8;
+  int tileSize = TILE_SIZE;
   dim3 threadsPerBlock(tileSize, tileSize);
   dim3 fullBlocksPerGrid((int)ceil(float(renderCam->resolution.x)/float(tileSize)), (int)ceil(float(renderCam->resolution.y)/float(tileSize)));
   
@@ -132,11 +284,29 @@ void cudaRaytraceCore(uchar4* PBOpos, camera* renderCam, int frame, int iteratio
     newStaticGeom.inverseTransform = geoms[i].inverseTransforms[frame];
     geomList[i] = newStaticGeom;
   }
-  
+  material *materialList = new material[numberOfMaterials];
+  for (size_t i = 0; i < numberOfMaterials; i++)
+  {
+	  material newMaterial;
+	  newMaterial.color = materials[i].color;
+	  newMaterial.emittance = materials[i].emittance;
+	  newMaterial.absorptionCoefficient = materials[i].absorptionCoefficient;
+	  newMaterial.specularColor = materials[i].specularColor;
+	  newMaterial.hasReflective = materials[i].hasReflective;
+	  newMaterial.hasRefractive = materials[i].hasRefractive;
+	  newMaterial.hasScatter = materials[i].hasScatter;
+	  newMaterial.indexOfRefraction = materials[i].indexOfRefraction;
+	  newMaterial.reducedScatterCoefficient = materials[i].reducedScatterCoefficient;
+	  newMaterial.specularExponent = materials[i].specularExponent;
+	  materialList[i] = newMaterial;
+  }
   staticGeom* cudageoms = NULL;
   cudaMalloc((void**)&cudageoms, numberOfGeoms*sizeof(staticGeom));
   cudaMemcpy( cudageoms, geomList, numberOfGeoms*sizeof(staticGeom), cudaMemcpyHostToDevice);
-  
+  material *dev_material = NULL;
+  cudaMalloc((void**)&dev_material, numberOfMaterials * sizeof(material));
+  cudaMemcpy(dev_material, materialList, numberOfMaterials*sizeof(material), cudaMemcpyHostToDevice);
+
   // package camera
   cameraData cam;
   cam.resolution = renderCam->resolution;
@@ -145,8 +315,19 @@ void cudaRaytraceCore(uchar4* PBOpos, camera* renderCam, int frame, int iteratio
   cam.up = renderCam->ups[frame];
   cam.fov = renderCam->fov;
 
+  //******************Motion blur*****************************
+#ifdef MOTION_BLUR
+  int motionIndex = 5;
+  float t = (float)(iterations / 50.0f);
+  float trans = sin(0.5*t);
+  geoms[motionIndex].translations[0].x = trans;
+  glm::mat4 transform = utilityCore::buildTransformationMatrix(geoms[motionIndex].translations[0], geoms[motionIndex].rotations[0], geoms[motionIndex].scales[0]);
+  geoms[motionIndex].transforms[0] = utilityCore::glmMat4ToCudaMat4(transform);
+  geoms[motionIndex].inverseTransforms[0] = utilityCore::glmMat4ToCudaMat4(glm::inverse(transform));
+#endif
+
   // kernel launches
-  raytraceRay<<<fullBlocksPerGrid, threadsPerBlock>>>(renderCam->resolution, (float)iterations, cam, traceDepth, cudaimage, cudageoms, numberOfGeoms);
+  raytraceRay<<<fullBlocksPerGrid, threadsPerBlock>>>(renderCam->resolution, (float)iterations, cam, traceDepth, cudaimage, cudageoms, numberOfGeoms,dev_material,numberOfMaterials);
 
   sendImageToPBO<<<fullBlocksPerGrid, threadsPerBlock>>>(PBOpos, renderCam->resolution, cudaimage);
 
@@ -156,10 +337,253 @@ void cudaRaytraceCore(uchar4* PBOpos, camera* renderCam, int frame, int iteratio
   // free up stuff, or else we'll leak memory like a madman
   cudaFree( cudaimage );
   cudaFree( cudageoms );
-  delete geomList;
-
+  delete[] geomList;
+  delete[] materialList;
+  cudaFree(dev_material);
   // make certain the kernel has completed
   cudaThreadSynchronize();
 
   checkCUDAError("Kernel failed!");
+
+  clockEnd = clock();
+  printf("one iteration completed in %d ms\n", clockEnd - clockBegin);
+}
+
+__global__ void raytraceRaySC(glm::vec2 resolution, float time, cameraData cam, int rayDepth, glm::vec3* colors,
+	staticGeom* geoms, int numberOfGeoms, material* materials, int numberOfMaterials, pixelRayUnit *rayPool)
+{
+	int idx = (blockIdx.x * blockDim.x) + threadIdx.x;
+	int index = rayPool[idx].index;
+	glm::vec3 color(1, 1, 1);
+	if (rayPool[idx].currentDepth > 0)
+	{
+		color = colors[index];
+	}
+	rayPool[idx].currentDepth++;
+
+	if (rayPool[idx].currentDepth >= rayDepth)
+	{
+		color *= 0;
+		rayPool[idx].isFinished = true;
+		colors[index] = colors[index] / (time + 1)*time + color / (time + 1);
+		return;
+	}
+#ifdef DOF
+	float focalLength = DOF;
+	glm::vec3 aimingPosition = rayPool[idx].pixelRay.origin + focalLength *  rayPool[idx].pixelRay.direction;
+	glm::vec3 rand = generateRandomNumberFromThread(cam.resolution, time * 3, rayPool[idx].x, rayPool[idx].y);
+	glm::vec3 camPosition = glm::vec3(cam.position.x + (float)rand.x, cam.position.y + (float)rand.y, cam.position.z + (float)rand.z);
+	rayPool[idx].pixelRay.origin = camPosition;
+	rayPool[idx].pixelRay.direction = glm::normalize(aimingPosition - camPosition);
+#endif
+	if (rayPool[idx].currentDepth <= rayDepth && !rayPool[idx].isFinished)
+	{
+		staticGeom *bounceGeom = NULL;
+		glm::vec3 bouncePoint, bounceNormal;
+		intersectionTest(geoms, numberOfGeoms, rayPool[idx].pixelRay, bounceGeom, bouncePoint, bounceNormal);
+		glm::vec3 rnd = generateRandomNumberFromThread(cam.resolution, time + (rayPool[idx].currentDepth + 1), rayPool[idx].x, rayPool[idx].y);
+		
+		if (bounceGeom != NULL)
+		{
+			
+			//calculate diffuse color
+			if (materials[bounceGeom->materialid].emittance > 0)	//light
+			{
+				color *= materials[bounceGeom->materialid].emittance * materials[bounceGeom->materialid].color;
+				rayPool[idx].isFinished = true;
+				colors[index] = colors[index] / (time + 1)*time + color / (time + 1);
+				return;
+			}
+		
+
+			glm::vec3 nextRayDir = calculateRandomDirectionInHemisphere(bounceNormal, rnd.x, rnd.y);
+
+			glm::vec3 H = glm::normalize(nextRayDir - rayPool[idx].pixelRay.direction);
+			float NdotH = glm::dot(H, bounceNormal);
+			if (materials[bounceGeom->materialid].hasRefractive <= 0)
+			{
+				color *= materials[bounceGeom->materialid].color;
+			}
+			//*************has reflection********************
+			if (materials[bounceGeom->materialid].hasReflective && rnd.z < materials[bounceGeom->materialid].hasReflective)
+			{
+				
+				glm::vec3 reflectionDir = calculateReflectionDirection(bounceNormal, rayPool[idx].pixelRay.direction);
+				float io = glm::dot(rayPool[idx].pixelRay.direction, bounceNormal);
+				rayPool[idx].pixelRay.origin = bouncePoint + 0.001f * reflectionDir;
+				rayPool[idx].pixelRay.direction = reflectionDir;
+
+			}
+			//********************Refractive***********************
+			else if (materials[bounceGeom->materialid].hasRefractive)
+			{
+				float io = glm::dot(rayPool[idx].pixelRay.direction, bounceNormal);
+				glm::vec3 refractionDir;
+			
+				if (rnd.z < 0.9f)
+				{
+					if (io < 0)
+					{
+						refractionDir = calculateTransmissionDirection(bounceNormal, rayPool[idx].pixelRay.direction, 1.0f, materials[bounceGeom->materialid].indexOfRefraction);
+					}
+					else
+					{
+						refractionDir = calculateTransmissionDirection(-bounceNormal, rayPool[idx].pixelRay.direction, materials[bounceGeom->materialid].indexOfRefraction, 1.0);
+					}
+					rayPool[idx].pixelRay.origin = bouncePoint + 0.001f * refractionDir;
+					rayPool[idx].pixelRay.direction = refractionDir;
+				
+					
+				}
+				else
+				{
+					glm::vec3 reflectionDir = calculateReflectionDirection(bounceNormal, rayPool[idx].pixelRay.direction);
+					rayPool[idx].pixelRay.direction = reflectionDir;
+					rayPool[idx].pixelRay.origin = bouncePoint + 0.001f * reflectionDir;
+					
+					
+				}
+			}
+			//**********************Diffuse*********************
+			else
+			{
+				rayPool[idx].pixelRay.direction = nextRayDir;
+				rayPool[idx].pixelRay.origin = bouncePoint + 0.001f * nextRayDir;
+			}
+
+		}
+		else
+		{
+			color *= 0;
+			rayPool[idx].isFinished = true;
+			colors[index] = colors[index] / (time + 1)*time + color / (time + 1);
+			return;
+		}
+	}
+	colors[index] = colors[index] / (time + 1)*time + color / (time + 1);
 }
+__global__ void initRayPool( glm::vec2 resolution, float time, cameraData cam, int rayDepth, glm::vec3* colors,
+	staticGeom* geoms, int numberOfGeoms, material* materials, int numberOfMaterials, pixelRayUnit *rayPool)
+{
+	int x = (blockIdx.x * blockDim.x) + threadIdx.x;
+	int y = (blockIdx.y * blockDim.y) + threadIdx.y;
+	int index = x + (y * resolution.x);
+	if (x < resolution.x && y < resolution.y)
+	{
+		rayPool[index].pixelRay = raycastFromCameraKernel(resolution, time, x, y, cam.position, cam.view, cam.up, cam.fov);
+		rayPool[index].isFinished = false;
+		rayPool[index].index = index;
+		rayPool[index].x = x;
+		rayPool[index].y = y;
+		rayPool[index].currentDepth = 0;
+	}
+}
+struct is_finished
+{
+	__host__ __device__
+	bool operator()(const pixelRayUnit x)
+	{
+		return x.isFinished;
+	}
+};
+void cudaRaytraceCoreSC(uchar4* PBOpos, camera* renderCam, int frame, int iterations, material* materials, int numberOfMaterials, geom* geoms, int numberOfGeoms){
+	clock_t  clockBegin, clockEnd;
+	clockBegin = clock();
+	int traceDepth = RAY_DEPTH; //determines how many bounces the raytracer traces
+	// set up crucial magic
+	int tileSize = TILE_SIZE;
+	dim3 threadsPerBlock(tileSize, tileSize);
+	dim3 fullBlocksPerGrid((int)ceil(float(renderCam->resolution.x) / float(tileSize)), (int)ceil(float(renderCam->resolution.y) / float(tileSize)));
+
+	// send image to GPU
+	glm::vec3* cudaimage = NULL;
+	cudaMalloc((void**)&cudaimage, (int)renderCam->resolution.x*(int)renderCam->resolution.y*sizeof(glm::vec3));
+	cudaMemcpy(cudaimage, renderCam->image, (int)renderCam->resolution.x*(int)renderCam->resolution.y*sizeof(glm::vec3), cudaMemcpyHostToDevice);
+
+	// package geometry and materials and sent to GPU
+	staticGeom* geomList = new staticGeom[numberOfGeoms];
+	for (int i = 0; i<numberOfGeoms; i++){
+		staticGeom newStaticGeom;
+		newStaticGeom.type = geoms[i].type;
+		newStaticGeom.materialid = geoms[i].materialid;
+		newStaticGeom.translation = geoms[i].translations[frame];
+		newStaticGeom.rotation = geoms[i].rotations[frame];
+		newStaticGeom.scale = geoms[i].scales[frame];
+		newStaticGeom.transform = geoms[i].transforms[frame];
+		newStaticGeom.inverseTransform = geoms[i].inverseTransforms[frame];
+		geomList[i] = newStaticGeom;
+	}
+	material *materialList = new material[numberOfMaterials];
+	for (size_t i = 0; i < numberOfMaterials; i++)
+	{
+		material newMaterial;
+		newMaterial.color = materials[i].color;
+		newMaterial.emittance = materials[i].emittance;
+		newMaterial.absorptionCoefficient = materials[i].absorptionCoefficient;
+		newMaterial.specularColor = materials[i].specularColor;
+		newMaterial.hasReflective = materials[i].hasReflective;
+		newMaterial.hasRefractive = materials[i].hasRefractive;
+		newMaterial.hasScatter = materials[i].hasScatter;
+		newMaterial.indexOfRefraction = materials[i].indexOfRefraction;
+		newMaterial.reducedScatterCoefficient = materials[i].reducedScatterCoefficient;
+		newMaterial.specularExponent = materials[i].specularExponent;
+		materialList[i] = newMaterial;
+	}
+	staticGeom* cudageoms = NULL;
+	cudaMalloc((void**)&cudageoms, numberOfGeoms*sizeof(staticGeom));
+	cudaMemcpy(cudageoms, geomList, numberOfGeoms*sizeof(staticGeom), cudaMemcpyHostToDevice);
+	material *dev_material = NULL;
+	cudaMalloc((void**)&dev_material, numberOfMaterials * sizeof(material));
+	cudaMemcpy(dev_material, materialList, numberOfMaterials*sizeof(material), cudaMemcpyHostToDevice);
+
+	// package camera
+	cameraData cam;
+	cam.resolution = renderCam->resolution;
+	cam.position = renderCam->positions[frame];
+	cam.view = renderCam->views[frame];
+	cam.up = renderCam->ups[frame];
+	cam.fov = renderCam->fov;
+	//stream compaction
+	pixelRayUnit *rayPool = NULL;
+	int rayPoolSize = cam.resolution.x * cam.resolution.y;
+	cudaMalloc((void**)&rayPool, (int)renderCam->resolution.x*(int)renderCam->resolution.y * sizeof(pixelRayUnit));
+	initRayPool << <fullBlocksPerGrid, threadsPerBlock >> >(renderCam->resolution, (float)iterations, cam, traceDepth, cudaimage, cudageoms, numberOfGeoms, dev_material, numberOfMaterials,rayPool);
+	int count = 0;
+	tileSize = 16;
+
+	while (rayPoolSize > 0 && count < traceDepth)
+	{
+		fullBlocksPerGrid = (int)ceil(float(rayPoolSize) / float(tileSize));
+
+		raytraceRaySC << <fullBlocksPerGrid, tileSize >> >(renderCam->resolution, (float)iterations, cam, traceDepth, cudaimage, cudageoms, numberOfGeoms, dev_material, numberOfMaterials, rayPool);
+
+		count++;
+		thrust::device_ptr<pixelRayUnit> iteratorStart(rayPool);
+		thrust::device_ptr<pixelRayUnit> iteratorEnd = iteratorStart + rayPoolSize;
+		iteratorEnd = thrust::remove_if(iteratorStart, iteratorEnd, is_finished());
+		rayPoolSize = (int)(iteratorEnd - iteratorStart);
+	}
+	tileSize = TILE_SIZE;
+	threadsPerBlock = dim3(tileSize, tileSize);
+	fullBlocksPerGrid = dim3((int)ceil(float(renderCam->resolution.x) / float(tileSize)), (int)ceil(float(renderCam->resolution.y) / float(tileSize)));
+
+	sendImageToPBO << <fullBlocksPerGrid, threadsPerBlock >> >(PBOpos, renderCam->resolution, cudaimage);
+
+	// retrieve image from GPU
+	cudaMemcpy(renderCam->image, cudaimage, (int)renderCam->resolution.x*(int)renderCam->resolution.y*sizeof(glm::vec3), cudaMemcpyDeviceToHost);
+
+	// free up stuff, or else we'll leak memory like a madman
+	cudaFree(cudaimage);
+	cudaFree(cudageoms);
+	delete[] geomList;
+	delete[] materialList;
+	cudaFree(dev_material);
+	cudaFree(rayPool);
+	// make certain the kernel has completed
+	cudaThreadSynchronize();
+
+	checkCUDAError("Kernel failed!");
+
+	clockEnd = clock();
+	printf("one iteration completed in %d ms\n", clockEnd - clockBegin);
+}
\ No newline at end of file
diff --git a/src/raytraceKernel.h b/src/raytraceKernel.h
index 984e89f..152f79c 100755
--- a/src/raytraceKernel.h
+++ b/src/raytraceKernel.h
@@ -15,5 +15,6 @@
 #include "sceneStructs.h"
 
 void cudaRaytraceCore(uchar4* pos, camera* renderCam, int frame, int iterations, material* materials, int numberOfMaterials, geom* geoms, int numberOfGeoms);
+void cudaRaytraceCoreSC(uchar4* pos, camera* renderCam, int frame, int iterations, material* materials, int numberOfMaterials, geom* geoms, int numberOfGeoms);
 
 #endif
diff --git a/src/utilities.cpp b/src/utilities.cpp
index a8e5d90..6f0ae4d 100755
--- a/src/utilities.cpp
+++ b/src/utilities.cpp
@@ -4,7 +4,7 @@
 //  File: utilities.cpp
 //  A collection/kitchen sink of generally useful functions
 
-#define GLM_FORCE_RADIANS
+//#define GLM_FORCE_RADIANS
 
 #include <glm/gtc/matrix_transform.hpp>
 #include <glm/gtc/matrix_inverse.hpp>
diff --git a/windows/Project3-Pathtracer/Project3-Pathtracer/Project3-Pathtracer.vcxproj b/windows/Project3-Pathtracer/Project3-Pathtracer/Project3-Pathtracer.vcxproj
index c45dd79..db91332 100644
--- a/windows/Project3-Pathtracer/Project3-Pathtracer/Project3-Pathtracer.vcxproj
+++ b/windows/Project3-Pathtracer/Project3-Pathtracer/Project3-Pathtracer.vcxproj
@@ -1,5 +1,5 @@
 ﻿<?xml version="1.0" encoding="utf-8"?>
-<Project DefaultTargets="Build" ToolsVersion="4.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
+<Project DefaultTargets="Build" ToolsVersion="12.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
   <ItemGroup Label="ProjectConfigurations">
     <ProjectConfiguration Include="Debug|Win32">
       <Configuration>Debug</Configuration>
@@ -19,16 +19,18 @@
     <ConfigurationType>Application</ConfigurationType>
     <UseDebugLibraries>true</UseDebugLibraries>
     <CharacterSet>MultiByte</CharacterSet>
+    <PlatformToolset>v120</PlatformToolset>
   </PropertyGroup>
   <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'" Label="Configuration">
     <ConfigurationType>Application</ConfigurationType>
     <UseDebugLibraries>false</UseDebugLibraries>
     <WholeProgramOptimization>true</WholeProgramOptimization>
     <CharacterSet>MultiByte</CharacterSet>
+    <PlatformToolset>v120</PlatformToolset>
   </PropertyGroup>
   <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
   <ImportGroup Label="ExtensionSettings">
-    <Import Project="$(VCTargetsPath)\BuildCustomizations\CUDA 5.5.props" />
+    <Import Project="$(VCTargetsPath)\BuildCustomizations\CUDA 6.5.props" />
   </ImportGroup>
   <ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
     <Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
@@ -95,6 +97,6 @@
   </ItemGroup>
   <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
   <ImportGroup Label="ExtensionTargets">
-    <Import Project="$(VCTargetsPath)\BuildCustomizations\CUDA 5.5.targets" />
+    <Import Project="$(VCTargetsPath)\BuildCustomizations\CUDA 6.5.targets" />
   </ImportGroup>
 </Project>
\ No newline at end of file
diff --git a/windows/Project3-Pathtracer/Project3-Pathtracer/test.0.bmp b/windows/Project3-Pathtracer/Project3-Pathtracer/test.0.bmp
new file mode 100644
index 0000000..d1fa4bb
Binary files /dev/null and b/windows/Project3-Pathtracer/Project3-Pathtracer/test.0.bmp differ