seq_len: variablenum_experts,num_local_experts,hidden_size,intermediate_size,gemm1_out_size,num_hidden_blocks,num_intermediate_blocks,num_gemm1_out_blocks: constant
routing_logits: Tensor of routing logits for expert selection [seq_len, num_experts]routing_bias: Bias tensor for routing. Pass all zeros for no bias [num_experts]hidden_states: Input hidden states tensor (FP8 quantized) [seq_len, hidden_size]hidden_states_scale: Block-wise scaling factors for hidden states [num_hidden_blocks, seq_len]gemm1_weights: First GEMM weights for all local experts (gate and up projections) [num_local_experts, gemm1_out_size, hidden_size]gemm1_weights_scale: Block-wise scaling factors for first GEMM weights [num_local_experts, num_gemm1_out_blocks, num_hidden_blocks]gemm2_weights: Second GEMM weights for all local experts (down projection) [num_local_experts, hidden_size, intermediate_size]gemm2_weights_scale: Block-wise scaling factors for second GEMM weights [num_local_experts, num_hidden_blocks, num_intermediate_blocks]local_expert_offset: Offset of local experts in global expert space (scalar)routed_scaling_factor: Scaling factor for routing weights (scalar)
output: Final MoE output tensor [seq_len, hidden_size]

