Last updated on June 30, 2025
a prime model for visual related AI and trained by experiments with diverse sensors
a model for AI, with human sense of physics to predict or estimate parameters like spatial depth, tactile, temperature, water depth and acceleration from only visual, which can be trained by device/robot/car/human equiped with specific sensors for visual and the parameters!
let the device/robot/car/human move, collide, push/throw, into fire, into water, damage/damaged to generate training data for correlation between visual and other parameters, which train the model to predict or estimate the values of other parameters from the visual.
use the parameters as the control expectation, creating a robut, reliable and clear control logic for AI model in applications like auto driving (like avoiding high press, tem) , not just let AI model mimic human behaviour visually without any sense of underlying logic or reason.
this “prime” model can be applied to any AI application related to visual, for example, applying to video generative application in which this prime model can make videos generated much more compliant to general physics in a way or sense or intuition like human’s.
control is not the only purpose of this model, and also. this model is for making AI’s generatings more compliant to general physics.
for each experiment, measure and input the mass or weight of the device/robot/car/human into the generated dataset of the experiment, and it’d be better add or label the mass or weight of interacting objects too, to make prime model estimate and consider weight from visual.
interacting objects include not only the objects interacting with the device/robot/car/human but also the objects interacting with each other, in the experiment or in any other scenerios which may generate usable training data like synthetic scenerios.
Grok suggested a hybrid control pattern combined with a seperate control layer and a reward approach for fsd, in which seperate control layer may be more understandable, controllable and out of black box of pattern. Grok: -100 reward for >0N pressure. I: -1000 for >1000N!
My impression: I used chatgpt in this January for first time and then used Grok later until now. I got an impression of present text generative AI. Text generative AI is the most powerful thinking assisting tool for human ever because text generative AI is trained by vast pulic text knowledge (except the secrets like info about UFO) and ai can corrlate these text knowledge to patterns based on statistics probabilities. But present text generative AI cannot do truly original innovation or guess outside known knowledge and patterns, especially ground breaking or revolutionary ones. But If you give an innovative idea or clue, text generative AI can evaluate, explore and even create plan for it quite well!
I have an idea about “a prime model for visual related AI and trained by experiments with diverse sensors” and talked about it with Grok in last week, and Grok did a great jot to help! I’ve post this idea on my X.com account “@oknomad7564” already, and so I’d like share it here.
Be First to Comment