Last updated on May 2, 2026
Made a mistake, it should be along columns which is same as omnitoken, which I got right at the beginning but then confused, now it’s clear.
Here is an example for combing along columns, and supposing there are 3 prebranch networks before a main network:
first, make the output hidden states matrices of the 3 prebranch networks H1[i, j1], H2[i, j2] and H3[i, j3] have same rows number i;
second, connect H1, H2 and H3 along columns to make one Hc[i, j1+j2+j3];
third, make the row number of first layer of the main network as i;
forth, input Hc into the main networks.
Be First to Comment