vllm.model_executor.layers.fused_moe.oracle.unquantized ¶
UnquantizedMoeBackend ¶
convert_to_unquantized_kernel_format ¶
convert_to_unquantized_kernel_format(
unquantized_backend: UnquantizedMoeBackend,
layer: Module,
w13_weight: Tensor | None = None,
w2_weight: Tensor | None = None,
) -> tuple[Tensor, Tensor]
Source code in vllm/model_executor/layers/fused_moe/oracle/unquantized.py
make_unquantized_moe_kernel ¶
make_unquantized_moe_kernel(
backend: UnquantizedMoeBackend,
quant_config: FusedMoEQuantConfig,
moe_config: FusedMoEConfig,
) -> tuple[FusedMoEModularKernel | None, bool]
Source code in vllm/model_executor/layers/fused_moe/oracle/unquantized.py
select_unquantized_moe_backend ¶
select_unquantized_moe_backend(
use_ep: bool, use_dp: bool
) -> UnquantizedMoeBackend
Select the primary FP8 MoE backend Note: Shape-specific fallbacks may still occur at runtime.