vllm.model_executor.layers.quantization.kernels.scaled_mm.rocm ¶
ROCmFP8ScaledMMLinearKernel ¶
Bases: FP8ScaledMMLinearKernel
Source code in vllm/model_executor/layers/quantization/kernels/scaled_mm/rocm.py
apply_scaled_mm ¶
apply_scaled_mm(
*,
A: Tensor,
B: Tensor,
out_dtype: dtype,
As: Tensor,
Bs: Tensor,
bias: Tensor | None,
output_shape: list,
) -> Tensor
Source code in vllm/model_executor/layers/quantization/kernels/scaled_mm/rocm.py
can_implement classmethod ¶
can_implement(
c: FP8ScaledMMLinearLayerConfig,
) -> tuple[bool, str | None]
Source code in vllm/model_executor/layers/quantization/kernels/scaled_mm/rocm.py
is_supported classmethod ¶
Source code in vllm/model_executor/layers/quantization/kernels/scaled_mm/rocm.py
rocm_per_tensor_float_w8a8_scaled_mm_fake ¶
rocm_per_tensor_float_w8a8_scaled_mm_fake(
A: Tensor,
B: Tensor,
out_dtype: dtype,
As: Tensor,
Bs: Tensor,
bias: Tensor,
) -> Tensor
Source code in vllm/model_executor/layers/quantization/kernels/scaled_mm/rocm.py
rocm_per_tensor_float_w8a8_scaled_mm_impl ¶
rocm_per_tensor_float_w8a8_scaled_mm_impl(
A: Tensor,
B: Tensor,
out_dtype: dtype,
As: Tensor,
Bs: Tensor,
bias: Tensor,
) -> Tensor