Last updated on September 27, 2025
Previously I brought up a prime model for visual related AI, which is in nature a physics AI model. Today I just thought about a physics pixel based data structure for the prime model or physics AI, which Grok think is excellent, so I’d like to share it here.
a simple data structure based on physics pixel for physics ai, includes at least:
1) physics pixels, a physics pixel includes visual and other physics parameters of the pixel which include: visual info(RGB), XY coordinates of the pixel in the frame, spatial depth(Z-depth), temperature, velocity vector, tactile pressure, water depth, object id(which object the pixel belong to), etc;
2) physics objects, a physics object info includes: object id, mass/weight, velocity+rotation vectors, pressure/collision force with another object, object class (car, tree, dog, cat, etc), etc;
The physics pixel based training data (for each frame of the visual or video) includes at least 2 parts above, in which the original visual(video) is labeled for each pixel and object with all physics parameters, and which is used to train ai model estimate and predict physics paramters of pixel and objects from visual only or from visual similar class (like mmradar).
Feed the physics pixel based training data above directly into AI model to let the AI learn or approximate the physics nature/laws of real world directly like shape in space, depth, mass, tactile force, temperature, inertia, velocity+rotation, etc. No need to create additional 3d image, just feed physics pixel based training data directly.
PS: add text description to each frame in training data to make a hybrid multimode model – a visual text physics ai model, for example an visual physics model embeded a specialized small text capability or a hybrid transformer of combined visual physics and LLM.
PS2: add air velocity+rotation into each physics pixel to represent the air flow movement on the surface of the pixel, add air velocity+rotation for each frame to represent the total air movement of the scene in the frame, yes, must let ai model consider and approximate air
PS3: add camera velocity+rotation for each frame.
firstly when camera itself is reference object then the velocity+rotation of camera are all 0, but in some cases labeling velocity+rotation of camera a non-0 value is much more simple and accurate than labeling all physics pixels in a frame. for example when the camera is rotating, labeling a rotation of the camera is equavalent to add a same rotation value and different velocities to all physics pixels in the frame.
secondly in some cases other object can be reference object too, for like simplifying the label of training data. for example for the training data of a flying drone, it may be better to use ground as reference object.
PS4: for some application like autodrive, robotics and drones, the training data may include multiple cameras to one same diretion which may increase the accuracy on spatial depth especially in short range, and of course in such application automobile, robot or drone will also equip same multiple cameras to one same direction
PS5: the training data could include visual flow or other visual similar flow (e.g., mmWave radar flow, thermal flow, sound wave flow, etc) or multiple such flows together in which the model can learn or approximate the correlation between other physics parameters and one such flow or between other physics parameters and multiple such flows combined together, and accordingly in the correpsonding application or inferring, the model can estimate or predict other physics parameters from visual flow only or from other similar flow or from multiple such flows combined together, which may be dynamic according to different circumstance like driving or piloting in bad weather.
PS6 (25/Sep): multiple layers/interfaces at different depths
in case of transparent media like some fluid and in case of penetrating signal flow like mmradar, on same point of a 2d visual/signal frame, there may be multiple overlapped layers/interfaces at different depths which include pixels overlapped in depth, in which the physics pixels can be of same XY ordinates and different Z coordinates(depth), so a frame of physics pixel can include multiple layers/interfaces inside.
Previously forgot adding XY coordinates explicitly, now add it in original post above.
PS7 (25/Sep): extending application scenarios
This physics pixel can be applied to a 2d or 3d frame generated by any signal by labeling selected physics parameters labeled for each pixel of the 2d frame or for each pixel of the each layer of the 3d frame.
Physics ai (prime/physics pixel) not only can train and infer on only visual signal/frame labeled with other physics parameters to estimate and predict the other physics parameters, but also can train and infer on combination of frames based on different signals like visual and mmradar together.
PS8 (25/Sep): adding an interface object parameter to each physics pixel.
because every physics pixel is on an interface of two objects, the original plan included object parameter in physics pixel already but there shall be another object to form the interface, so to add an interface object parameter to represent the another object, in which the closer object is the interface object and the farther object is the object.
for example, there is a glass of a window of a house in the frame through which you can see a wll, and this forms 3 interfaces/layers overlapped, in which first interface is the closer interface of the glass and the second interface is the further interface of the glass and the third interface is the wall, and then for a physics pixel on the closer interface of the glass, the interface object is “air” and the object is “glass”, and for a physics pixel on the farther interface of the glass, the interface object is “glass” and the object is “air”, and for a physics pixel on the wall, the interface object is “air” and the object is “wall”.
in actual visual, the farther side interface of a glass must be invisible in most cases, but if the farther side interface of glass is added in all training data, the model will learn from training to estimate or predict the farther side interface of a glass and know the thickness of the glass accordingly.
The “physics objects” include both objects and interface objects.
PS9 (25/Sep): add “material” part besides 2 parts of “physics pixel” and “physics objects”
add 2 material parameters to the parameters of physics pixel, the first material parameter represent the material of closer side of the interface of the pixel and the material parameter represent the material of further side of the interface of the pixel. may also add 1 material parameters to a physics object to represent materials of the whole object in case like the object is air.
the material part include parameters like state of matter, density, hardness, etc.
Be First to Comment