Just tried two new LUs – PowerLU and ExpoLU which are not continuously differentiable like ReLU.
1) PowerLU’s math:
for x<-1, y=0 and y’=0;
for -1<=x<1, y=(x+1)**p/(2**p) and y’=p*(x+1)**(p-1)/(2**p), in which when x=1, y=1 and y’=p/2;
for 1<x, y=x and y’=1.
2) ExpoLU’s math
for x<-b, y=0 and y’=0;
for -b<=x<b, y=(x+b)**p and y’=p*(x+b)**(p-1), in which when x=b, y=(2*b)**p;
for b<x, y=x+(2*b)**p-b and y’=1, in which x=b, y=(2*b)**p.
3) I’m trying these Linear units for fun and also for: first to make training converge become more jumping to avoid small deep holes, second to provide some curve around x=0 for activation.
In test of a neural network of 512 features and 15/30 layers to approximate 21 pairs of x and y, both PowerLU and ExpoLU can work with proper settings.
Be First to Comment