Last updated on July 23, 2025
I posted this idea first on my X.com account “@oknomad7564”, and I’d like share it here.
a prime model for visual related AI and trained by experiments with diverse sensors
a prime model for AI, with human sense of physics to estimate and predict the physics parameters like spatial depth, tactile, temperature, water depth and acceleration for present and possible future from only visual data, which can be trained by device/robot/car/human equiped with specific sensors for visual and the physics parameters!
let the device/robot/car/human move, collide, push/throw, get into fire, get into water, damage/damaged to generate training data for correlation between a visual video flow and corresponding physics parameters.
train the prime model to estimate and predict the physics parameters of different objects for present and possible future based on the visual video flow from the training data above.
the trained prime model estimate and predict the physics parameters for present and possible future based on visual video flow in control, creating a robust, reliable and clear control logic for AI model in applications like auto driving (such as avoiding possible high tactile pressure, high temperature, deep water). Not just let AI model mimic human behaviour visually without any sense of underlying logic or reason.
control is not the only purpose of this model, and this “prime” model can be applied to any AI application related to visual, for example, applying to video generative application in which this prime model can estimate the physics parameters according to the videos to make videos generated much more compliant to general physics in a sense or intuition like human’s.
in each experiment, measure and input the mass or weight of the device/robot/car/human into the generated dataset of the experiment, and it’d be better add or label the mass or weight of all other interacting objects too, to make prime model be able to estimate and consider weight from visual.
beside experiments, synthetic scenerios can reduce training cost and time significantly.
Grok suggested a hybrid control pattern combined with a seperate control layer and a reward approach for fsd, in which seperate control layer may be more understandable, controllable and out of black box of pattern. Grok: -100 reward for >0N pressure. Me: -1000 reward for >1000N!
PS:in application like autodriving or robotics or drones, beside visual sensor, the prime model can use supplement spatial depth sensor like stereo camera, two eyes camera or mmradar.
PS2: yeha, add microphone to device/robot/car/human in experiment too, for creating correlation between visual and sound too, and the physics parameters may include spatial depth, tactile, temperature, water depth, acceleration, mass and sound.
Be First to Comment