Skip to content

a visual text physics model

This visual text physics model is a visual+text hybrid model, for example a “CNN+transformer” hybrid model in which CNN process visual flow to generate sequential token to send to transformer along with synced physics parameters and text describing the visual flow. This visual text physics model can be trained on synced visual flow, measured physics parameters (or calculated in simulation training data) and text describing the visual flow. After training, this hybrid model will learn the correlation between visual flow, physics parameters and text describing the visual flow, and then this hybrid model can learn to generate corresponding description text and estimated physics parameters for present visual flow, and furthermore this hybrid model can even learn to predict physics parameters and description text for the possible future of a visual flow.

If this visual text physics model was trained well, it can generate strong correlation between visual flow, physics parameters and text. For example, the model may learn to use physics formula and calculation in text to help analyze the estimated physics parameters for visual, or vise verse the model may learn to estimate physics parameters of a visual flow to check if the text generating the visual flow is reasonable or accurate.

This visual text physics model is an upgrade of my previous post “a prime model for visual related AI and trained by experiments with diverse sensors”. This previous post introduced how to train a model correlating visual flow with physics parameters by experiment data, including “a prime model for AI, with human sense of physics to estimate and predict the physics parameters like spatial depth, tactile, temperature, water depth and acceleration for present and possible future from only visual data, which can be trained by device/robot/car/human equiped with specific sensors for visual and the physics parameters”.

This visual text physics model can also be or evolve from a “visual model with a limited LLM” which is mentioned in my another previous post “a multimode AI model comprising a visual model with a specific grammar format text interface connecting a text language model directly”. As I mentinoed in this previous post, this “visual model with a specific grammar format text inface” can be a “CNN+transformer” which can elove to a complete model processing all visual flow, physics parameters and text language together by itself without connecting another text language model.

Published inUncategorized

Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *