It’s been used for a better performance, but only works for leaf functions.
Working szenario (Leaf function)
void foo() { return; }
void bar() { return; }
int main()
{
foo(); // Write PC+1 (foo()+1) into LR, goto foo and come back via LR
bar(); // Write PC+1 (bar()+1) into LR, goto bar and come back via LR
return 0;
}
Nice, it’s fast, cause we only cache the return address in a register (LR) and not on the stack.
Not working szenario (Non-leaf function)
void bar();
void foo() {
bar(); // Write PC+1 (bar()+1) into LR, goto bar and come back via LR
return; // You can't get back to main, cause LR points to bar()+1!
}
void bar() { return; }
int main()
{
foo(); // Write PC+1 (foo()+1) into LR, goto foo and come back via LR?
return 0;
}
Shit, for non-leaf functions we have to operate like Intel/AMD and have to store the LR on the stack.