Last updated on May 11, 2026
Just tried two new LUs – PowerLU and ExpoLU which have continuous value y but are not continuously differentiable just like ReLU.
ExpoLU beats ReLU by huge advantages (always several magnitudes) in my few tests of 300 rounds training of a neural network of 512 features and 30 layers to approximate 21 pairs of x and y, under same setting including learning rate.
1) PowerLU’s math:
x<-1, y=0, y’=0
-1<=x<1, y=(x+1)**p/(2**p)
1<x, y=x
2) ExpoLU’s math
ExpoLU has 3 parameters: a, b, p.
x<-b, y=0
-b<=x<b, y=(1/a)(x+b)**p
b<x, y=x+(1/a)(2b)**p-b
It’d better be a=2**n to make division as a bit shift in hardware.
If set a=1, ExpoLU has another simplified version (first version):
x<-b, y=0
-b<=x<b, y=(x+b)**p
b<x, y=x+(2b)**p-b
3) I’m trying these Linear units for fun and also for: first to make training converge become more jumping to avoid small deep holes, second to provide some curve around x=0 for activation.
In my few tests of a neural network of 512 features and 30 layers to approximate 21 pairs of x and y, both PowerLU and ExpoLU can work with proper settings.
Especially in the few tests above, Relu is not a match of ExpoLU at all in same settings including learning rate, in which Relu is far way below ExpoLU’s performance.
ExpoLU is a real monster here and it has 3 parameters (a, b, p), but I just tried several simplest combinations like a=2,b=1,p=2 and a=1,b=1,p=2, in which a=1 reach 1e-5 to 1e-9 and a=2 reach 1e-5 to 1e-7 (while Relu reach 1e-1 only once in 3 tests).
Be First to Comment