Multiply-Accumulate
MADD and MSUB: Fused Multiply-Accumulate
Many real-world computations follow the pattern a + (b * c) or a - (b * c). ARM64 provides dedicated instructions for these patterns that execute in a single cycle.
"Warp 9.975, Mr. La Forge!" -- computing warp factors requires some serious multiply-accumulate math. Fortunately, MADD does it in a single instruction.
MADD -- Multiply-Add
MADD computes Xd = Xa + (Xn * Xm):
MADD X0, X1, X2, X3 // X0 = X3 + (X1 * X2)
The order of operands is: destination, first multiply source, second multiply source, addend.
A common use: computing MUL is actually encoded as MADD Xd, Xn, Xm, XZR (multiply and add zero).
MSUB -- Multiply-Subtract
MSUB computes Xd = Xa - (Xn * Xm):
MSUB X0, X1, X2, X3 // X0 = X3 - (X1 * X2)
This is extremely useful for computing remainders. Remember the remainder pattern?
// Old way: three instructions
UDIV X2, X0, X1 // quotient = a / b
MUL X3, X2, X1 // temp = quotient * b
SUB X4, X0, X3 // remainder = a - temp
// New way: two instructions with MSUB
UDIV X2, X0, X1 // quotient = a / b
MSUB X4, X2, X1, X0 // remainder = a - (quotient * b)
Dot Product
The dot product of two vectors is a fundamental operation in linear algebra, graphics, machine learning, and signal processing. For two vectors A and B of length N:
dot = A[0]*B[0] + A[1]*B[1] + ... + A[N-1]*B[N-1]
MADD is perfect for this. Start with an accumulator of 0 and repeatedly multiply-add:
MOV X4, #0 // accumulator = 0
// For each pair (a, b):
MADD X4, Xa, Xb, X4 // accumulator += a * b
Polynomial Evaluation
Another common application is evaluating polynomials. For ax^2 + bx + c, you can use Horner's method: (a*x + b)*x + c:
// Evaluate 3x^2 + 2x + 1 at x=5
MOV X0, #5 // x
MOV X1, #3 // a
MOV X2, #2 // b
MOV X3, #1 // c
MADD X1, X1, X0, X2 // X1 = a*x + b = 17
MADD X1, X1, X0, X3 // X1 = (a*x + b)*x + c = 86
Your Task
Compute the dot product of two 4-element vectors:
- A = [3, 5, 2, 4]
- B = [1, 4, 6, 2]
The dot product is: 31 + 54 + 26 + 42 = 3 + 20 + 12 + 8 = 43.
Use MADD for the accumulation. Print the result (43) followed by a newline.