Your project is to establish the mathematical foundation for an intuitive camera control system for 3D touch-screen applications utilizing an overhead 3rd person view of an environment.
A 3D left-handed cartesian coordinate system called "world space",
"World Space" contains a number of "child" cartesian coordinate systems called "child frames", which may themselves contain their one children.
A single affine 4D transformation matrix specifies the transformation of points from parent to child frame and is termed the local transform".
The hierarchy of coordinate systems is termed the "scene graph".
A child frame of "World" space is named "LookAt" (a 3D perspective camera will be aimed at the origin of this frame).
The LookAt local transform is single translation in the XZ plane (Y=0) named "M_la"
A child frame of "LookAt" is named "WorldYRotation".
The "WorldYRotation" local transform is a single rotation around the Y axis named "M_yrot"
A child frame of "WorldYRotation" is named "XRotation".
The XRotation local transform is a non-zero rotation around the X axis named "M_xrot"
A child frame of "XRotation" is named "CameraSpace"
The PushBack local transform is a translation matrix in the negative Z direction named "M_ztrans"
A child coordinate system of CameraSpace is named "ViewportSpace"
The ViewportSpace local transform is a perspective projection matrix which remains constant named "M_proj"
(The camera is rendered in looking in the positive-z direction)
A child coordinate system of ViewportSpace is named "ScreenSpace"
The ScreenSpace local transform is a composed of scaling and translation in the X and Y axes named M_ss
The concatenation of (M_ztrans * M_xrot * M_yrot * M_la) is termed "M_view"
Thus, the to transform a point from World Space to Screen Space, the mathematical operations are:
P' = M_ss * M_proj * M_view * P
where P is the point (x,y,z,1) in the "World Space" coordinate system
P' is the point (x0/w, y0/w, z0/w, 1) in the "Screen Space" coordinate system
Now that we've established the foundations of 3D computer graphics... :-D
Holding the following constant: M_ss, M_proj, M_ztrans, M_xrot, M_yrot.
Assuming M_la is of the from:
1, 0, 0, 0
0, 1, 0, 0
0, 0, 1, 0
x_la, 0, y_la, 1
Given: a ray r0 defined by (screen space: sx0, sy0, z) where Z is from 0 to infinity and sx0, sy0 are constant along the ray.
Find the intersection of the ray and (world space: XZ plane y=0). This point is the "handle location"
Given a new ray r1 defined by (screen space: sx1, sy1, z1)...
***Find new values for m_la (x_la, y_la) so the "handle location" intesects ray r1.
Motivation: As a finger touches a screen in screen space, the user grabs "ahold" of a "handle" location on the Y=0 plane. Further movements of the finger in screen space drag the handle around the screen by moving the camera in the XZ plane.
Same as above, but a second finger touch to the screen is added to solve two new variables:
y_worldRotation, and zoomValue.
The following from problem 1 are now no longer constant:
M_ztrans, M_xrot, M_yrot
M_ztrans and M_xrot are specified by zoomValue.
M_ztrans has the following form:
1, 0, 0, 0
0, 1, 0, 0
0, 0, 1, 0
0, 0, z_ztrans, 0
where z = Czm * zoomValue + Czb (0 < zoomValue < 1, Czm and Czb are constants that guarantee z < 0)
M_xrot = rotation matrix around the x axis were the rotation is xrot = Cxm * zoomValue + Cxb
(0 < zoomValue < 1, and Cxm and Cxb are constants that guarantee the final camera placement in world space is Y>0 with the camera aimed downwards.)
Motivation: As the camera is zoomed in, it moves closer to the target and is rotated more horizontally than vertically.
***Given two initial screen space rays that create to "handles" in the Y=0 plane, for any new values for each screen space ray solve for x_la, y_la, z_ztrans, and zoomValue to keep the handles intersected with the new rays.
The job is done when your solution is working in my code (Unity3D)