In most previous posts, I just forgot to consider multiple cameras (or multiple sensors of same kind) of one oberser (like a car or robot), and because one camera case is only suitable for video genenrative application, here give some simple idea about the case of multiple cameras (or multiple sensors of same kind).
Each camera’s frame or physics pixel frame is based on its own perspective cooridnate system, so each frame of a camera needs to add camera ID to the start or end of the frame to identity which camera the frame is from. Then it’s all OK, and leave it all to neural network of AI model. Because no matter how multiple cameras are configured like same directional or different directional or omnidirectional, after training on labeled and unlabeled frames flow, the AI model can learn the correlation between cameras by itself, haha, this is the real magic that makes AI so great.
There shall be only one single rectangular coordinate system corresponding to all different perspective coordinate system of all cameras of one observer. The AI model can learn the intermapping relation between the different perspective coordinate system and the single rectangular coordinate system through proper training.
Theoretically, no two cameras can have exactly same position and direction, so no two cameras can have a same frame at same time, which means each pixel or physics pixel on a frame of a camera must be different from all pixels or physics pixels of synced frames of all other cameras.
So a simple idea is: in the single rectangluar coordinate system, map each phyisc pixel of a frame of each camera to an 3D point which is different from all other mapping 3D points of all other physics pixels of synced frames of all cameras. In training map all different physics pixels to different 3D point of cloud in the single rectangular coordinate system, to make the model learn to map them in the same way in reference, in which the difference between mapping 3D points could be very small which is just for making them different.
For fusion of different kinds of sensor signals, a simple idea is: just map secondary signal like mmradar to every physics pixel frame or perspective coordinate system of all basic/first signal like visual camera, or map secondary signal directly to the single rectangular coordinate system of the same observer.
Be First to Comment