Last updated on October 28, 2025
The view of a camera is twisted just like human eye, but human brain’s neural network can adjust the view from human eye to make human feel accurate 3D spatial sense. And also a 3D model like 3D points cloud dosen’t have and depend on perspective view. So a physics pixel frame can have a projection to 3D points cloud to make the AI model approximate and understsand the spatial nature of real world more accurately.
For a 3D points cloud of a physics pixel frame, each physics pixel of the physics pixel frame corresponds to one 3D point of the cloud, and each 3D point of the cloud corresponds to a physics pixel of the frame as well. Each 3D point of the cloud has a 3D coordinate (X’, Y’, Z’), which represents the spatial position of this 3D point in a standard 3D coordinate system (with the obersever or a fixed point as origin). Each physics pixel has a frame coordinate (X, Y, Z) in the frame, XY are the coordinates of the pixel in 2D view of the frame, and Z is the depth of the physics pixel which is the distance from the physics pixel to the camera.
The coordinates (X’, Y’, Z’) of each 3D point of the cloud can be in the parameters of the corresponding physics pixel of the frame, or the coordinates (X’, Y’, Z’) of all 3D points with the coordinates (X, Y, Z) of corresponding physics pixels can be added as a whole to the start or end of the physics pixel frame.
In training data, label the (X’, Y’, Z’) of the corresponding 3D point of each physics pixel of a frame, and in training, make the AI model learn to generate (X’, Y’, Z’) from original 2D visual frame.
There are near object and far object in the parameters of a physics pixel, so each physics pixel in the frame and its corresponding 3D point respectively have a direction, and the direction is from near object to far object and perpendicular to the interface between the near object and the far object, and the direction can be represented by a vector like (0,0,1), so the (X, Y, Z) coordinates of a physics pixel can be turned to directional coordinate like frame_position(X, Y, Z)/frame_direction(0, 0, 1), and the corresponding 3D point of a physics pixel will also have directional coordinate 3D_position(X’, Y’, Z’)/3D_direction(U’, V’, W’) accordingly, in which (X’, Y’, Z’) are coordinates of position of the 3D point and (U’, V’, W’) are direction vector representing the direction from near object to far object.
Use mm or um as length unit for 3D points cloud, so all X’, Y’, Z’ are integers. Make all U’, V’, Z’ between [0, 1], so make the AI model easier to differentiate (X’, Y’, Z’) from (U’, V’, Z’).
And the AI model can also learn the adajency of 3D points from the adjacency of corresponding physics pixels in the physics pixels frame.
With the directional coordinates above, the AI model can learn and approximate the near/far object and corresponding interface in not only physics pixel frame but also 3D points cloud.
With corresponding 3D points cloud, a physics pixel frame has 2 coordinates system: 1) frame coordinate system (frame_position(X, Y, Z)/frame_direction(0, 0, 1)), which is a perspective coordinate system, 2) 3D coordinate system(3D_position(X’, Y’, Z’)/3D_direction(U’, V’, W’)), which is a rectangular coordinate system. So any position in the physics pixel frame can be labeled by either or both of the two coordinate system. For example, in fusion of different kinds of sensor signals mentioned previously, some position like source of a sound may be representd by 3D position and 3D direction.
Although I believe that phsics pixel without explicit corrected 3D model (3D points cloud) may handle most control applications well by implicit correlations if trained properly, explicit 3D points cloud may improve its performance and accuracy in all applications quite much especially for like video generative application.
Be First to Comment