Skip to content

branch networks by combining hidden states along columns

Previously suggested branch networks based on intermediate omnitoken, but after study of basic theory of neural network and transformer, I found it should be branch networks by combining (concatenation of) hidden states along columns (feature dimension).

Supposing there are 3 prebranch networks before a main network:
first, make the output hidden states matrices of the 3 prebranch networks H1[i, j1], H2[i, j2] and H3[i, j3] have same row number i;
second, connect H1, H2 and H3 along columns to make one Hc[i, j1+j2+j3];
third, make the row number of first layer of the main network as j1+j2+j3;
forth, input Hc into the main networks.

This combined j1+j2+j3 fully preserve each prebranch’s features to send to the main network.

So based on this branch networks by combing hidden states along columns, you can realize deeper specialization per modality in multimodality and fixed-dynamic networks for continual learning which I have introduced in earlier posts.

Published inUncategorized

Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *