Add Sifive Xsfvqmacc extension intrinsics #5

yulong18 · 2024-09-29T06:58:10Z

No description provided.

yulong18 · 2024-09-29T07:06:19Z

Hi, @kito-cheng and @monkchiang :
I create a new PR. I create a new temporary branch(sifive-rvv-intrinsic-temp)，but I will delete it later.
I put the test case in the comment.
`#include "riscv_vector.h"

vint32m1_t test1(vint32m1_t vd, vint8m1_t vs1, vint8mf2_t vs2, size_t vl) {
return __riscv_sf_vqmacc_4x8x4_i32m1(vd, vs1, vs2, vl);
}`

kito-cheng · 2024-09-30T05:30:14Z

gcc/common/config/riscv/riscv-common.cc

@@ -1750,6 +1752,9 @@ static const riscv_ext_flag_table_t riscv_ext_flag_table[] =

  {"xsfvcp",   &gcc_options::x_riscv_sifive_subext, MASK_XSFVCP},
  {"xsfcease", &gcc_options::x_riscv_sifive_subext, MASK_XSFCEASE},
+  {"xsfvqmaccqoq", &gcc_options::x_riscv_sifive_subext, MASK_XSFVQMACCQOQ},
+  {"xsfvqmaccdod", &gcc_options::x_riscv_sifive_subext, MASK_XSFVQMACCDOD},
+


Drop this blank line

kito-cheng · 2024-09-30T05:30:24Z

gcc/config/riscv/riscv-c.cc

@@ -215,6 +215,7 @@ riscv_cpu_cpp_builtins (cpp_reader *pfile)
    builtin_define_with_int_value ("__riscv_th_v_intrinsic",
 				   riscv_ext_version_value (0, 11));

+


Drop this blank line

kito-cheng · 2024-09-30T05:31:16Z

gcc/config/riscv/riscv-vector-builtins-shapes.cc

@@ -1321,4 +1354,5 @@ SHAPE(seg_fault_load, seg_fault_load)
 SHAPE(crypto_vv, crypto_vv)
 SHAPE(crypto_vi, crypto_vi)
 SHAPE(crypto_vv_no_op_type, crypto_vv_no_op_type)
+SHAPE(sf_vqmacc,sf_vqmacc)


Suggested change

SHAPE(sf_vqmacc,sf_vqmacc)

SHAPE(sf_vqmacc, sf_vqmacc)

kito-cheng

Could you add testcase?

Also you could copy /contrib/clang-format to /.clang-format, then you can use clang-format or git clang-format to format your code

kito-cheng · 2024-10-25T05:12:11Z

gcc/config/riscv/riscv-vector-builtins-bases.cc

+      return e.use_exact_insn (
+	  code_for_pred_fnr_clip (ZERO_EXTEND, e.vector_mode ()));


How to distinguish between x and xu here?

kito-cheng · 2024-10-25T05:13:09Z

gcc/config/riscv/riscv-vector-builtins-bases.cc

+      gcc_unreachable ();
+    }
+  }


Suggested change

gcc_unreachable ();

}

}

}

gcc_unreachable ();

}

gcc_unreachable in wrong level I think?

kito-cheng · 2024-10-25T05:13:32Z

gcc/config/riscv/riscv-vector-builtins-bases.cc

+static CONSTEXPR const vfnrclip x_obj;
+static CONSTEXPR const vfnrclip xu_obj;


sf_vfnrclip_x_obj;
sf_vfnrclip_xu_obj;

kito-cheng · 2024-10-25T05:14:57Z

gcc/config/riscv/riscv-vector-builtins-bases.cc

+    if (e.op_info->op == OP_TYPE_4x8x4)
+      return e.use_widen_ternop_insn (
+    code_for_pred_quad_mul_plusus_qoq (e.vector_mode ()));


kito-cheng · 2024-10-25T05:15:16Z

gcc/config/riscv/riscv-vector-builtins-bases.cc

+static CONSTEXPR const vqmacc vqmacc_obj;
+static CONSTEXPR const vqmaccu vqmaccu_obj;
+static CONSTEXPR const vqmaccsu vqmaccsu_obj;
+static CONSTEXPR const vqmaccsu vqmaccus_obj;


add sf_ prefix

kito-cheng · 2024-10-25T05:15:57Z

gcc/config/riscv/riscv-vector-builtins-shapes.cc

+  /* vop_v --> vop_v_<type>.  */
+  b.append_name (type_suffixes[instance.type.index].vector);


kito-cheng · 2024-10-25T05:16:17Z

gcc/config/riscv/riscv-vector-builtins-shapes.cc

+    if (overloaded_p && (instance.pred == PRED_TYPE_tu || instance.pred == PRED_TYPE_mu ||
+    instance.pred == PRED_TYPE_tumu))


This patch folds svindex with constant arguments into a vector series. We implemented this in svindex_impl::fold using the function build_vec_series. For example, svuint64_t f1 () { return svindex_u642 (10, 3); } compiled with -O2 -march=armv8.2-a+sve, is folded to {10, 13, 16, ...} in the gimple pass lower. This optimization benefits cases where svindex is used in combination with other gimple-level optimizations. For example, svuint64_t f2 () { return svmul_x (svptrue_b64 (), svindex_u64 (10, 3), 5); } has previously been compiled to f2: index z0.d, riscvarchive#10, #3 mul z0.d, z0.d, #5 ret Now, it is compiled to f2: mov x0, 50 index z0.d, x0, riscvarchive#15 ret We added test cases checking - the application of the transform during gimple for constant arguments, - the interaction with another gimple-level optimization. The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. OK for mainline? Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com> gcc/ * config/aarch64/aarch64-sve-builtins-base.cc (svindex_impl::fold): Add constant folding. gcc/testsuite/ * gcc.target/aarch64/sve/index_const_fold.c: New test.

We can make use of the integrated rotate step of the XAR instruction to implement most vector integer rotates, as long we zero out one of the input registers for it. This allows for a lower-latency sequence than the fallback SHL+USRA, especially when we can hoist the zeroing operation away from loops and hot parts. This should be safe to do for 64-bit vectors as well even though the XAR instructions operate on 128-bit values, as the bottom 64-bit results is later accessed through the right subregs. This strategy is used whenever we have XAR instructions, the logic in aarch64_emit_opt_vec_rotate is adjusted to resort to expand_rotate_as_vec_perm only when it's expected to generate a single REV* instruction or when XAR instructions are not present. With this patch we can gerate for the input: v4si G1 (v4si r) { return (r >> 23) | (r << 9); } v8qi G2 (v8qi r) { return (r << 3) | (r >> 5); } the assembly for +sve2: G1: movi v31.4s, 0 xar z0.s, z0.s, z31.s, riscvarchive#23 ret G2: movi v31.4s, 0 xar z0.b, z0.b, z31.b, #5 ret instead of the current: G1: shl v31.4s, v0.4s, 9 usra v31.4s, v0.4s, 23 mov v0.16b, v31.16b ret G2: shl v31.8b, v0.8b, 3 usra v31.8b, v0.8b, 5 mov v0.8b, v31.8b ret Bootstrapped and tested on aarch64-none-linux-gnu. Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com> gcc/ * config/aarch64/aarch64.cc (aarch64_emit_opt_vec_rotate): Add generation of XAR sequences when possible. gcc/testsuite/ * gcc.target/aarch64/rotate_xar_1.c: New test.

kito-cheng · 2024-11-04T10:58:51Z

gcc/config/riscv/riscv-vector-builtins.h

+/* Return true if intrinsics maybe require qfrm operand.  */
+  virtual bool may_require_qfrm_p () const;
+


What's qfrm ?

kito-cheng · 2024-11-04T10:59:01Z

gcc/config/riscv/riscv-vector-builtins.h

+/* We choose to return false by default since most of the intrinsics does
+   not need qfrm operand.  */
+inline bool
+function_base::may_require_qfrm_p () const
+{
+  return false;
+}
+


kito-cheng · 2024-11-05T14:02:22Z

gcc/testsuite/gcc.target/riscv/rvv/sf_vfnrclip.c

+#include "riscv_vector.h"
+
+vint8mf8_t test1(float vs1, vfloat32mf2_t vs2, size_t vl) {
+        return  __riscv_sf_vfnrclip_x_f_qf_i8mf8(vs2, vs1, vl);
+}
+


Could you reference this file https://github.com/gcc-mirror/gcc/blob/master/gcc/testsuite/gcc.target/riscv/target-attr-01.c and add correct test directive?
e.g.

/* { dg-do compile } */ /* { dg-skip-if "" { *-*-* } { "-flto" } { "" } } */ /* { dg-options "-march=rv64gcv_xsfvfnrclipxfqf -O2 -mabi=lp64d" } */ /* { dg-final { check-function-bodies "**" "" } } */

and add some check within the testcase like:

/* ** foo: ** ... ** vsetvli\s*x0, a0, e32, m1, ta, ta ** sf.vfnrclip.x.f.qf\s*fa0,v8 ** ... */

And don't forgot sf_vqmacc.c

kito-cheng · 2024-11-05T14:03:36Z

gcc/testsuite/gcc.target/riscv/rvv/sf_vfnrclip.c

+vint8m2_t test2(float vs1, vfloat32m8_t vs2, size_t vl) {
+        return  __riscv_sf_vfnrclip_x_f_qf_i8m2(vs2, vs1, vl);
+}


Also testcase for __riscv_sf_vfnrclip_xu_f_qf_*, not really need all combination but please add few to improve the coverage

kito-cheng · 2024-11-05T14:04:56Z

gcc/config/riscv/riscv.md

@@ -381,6 +381,7 @@
 ;; vidiv       vector single-width integer divide instructions
 ;; viwmul      vector widening integer multiply instructions
 ;; vimuladd    vector single-width integer multiply-add instructions
+;; vsfmuladd   vector matrix integer multiply-add instructions


Minor comment, please put this in the end of the comment, don't mixed with standard instruction

kito-cheng · 2024-11-05T14:05:01Z

gcc/config/riscv/riscv.md

@@ -390,6 +391,7 @@
 ;; vsmul       vector single-width fractional multiply with rounding and saturation instructions
 ;; vsshift     vector single-width scaling shift instructions
 ;; vnclip      vector narrowing fixed-point clip instructions
+;; vsfclip     vector fp32 to int8 ranged clip instructions


yulong18 · 2024-11-21T07:09:19Z

Hi, @kito-cheng :
There are two new commits that add the support for vqmacc. I tested it and all testcases passed. Please review it. Thanks!
Today, I also will add the code that support vfnrclipxfqf and vcix extensions for this PR.

kito-cheng · 2024-11-21T10:17:45Z

gcc/config/riscv/riscv-vector-builtins-shapes.cc

+    /* According to SIFIVE vector-intrinsic-doc, it adds  suffixes
+       for vop_m C++ overloaded API.*/


the comment seems not updated?

kito-cheng · 2024-11-21T10:18:25Z

gcc/config/riscv/riscv-vector-builtins-bases.cc

+    return e.use_exact_insn (
+      code_for_pred_sf_vfnrclip_x_f_qf (UNSPEC, e.vector_mode ()));
+    gcc_unreachable ();


Suggested change

return e.use_exact_insn (

code_for_pred_sf_vfnrclip_x_f_qf (UNSPEC, e.vector_mode ()));

gcc_unreachable ();

return e.use_exact_insn (

code_for_pred_sf_vfnrclip_x_f_qf (UNSPEC, e.vector_mode ()));

kito-cheng · 2024-11-21T10:18:51Z

gcc/config/riscv/riscv-vector-builtins-bases.cc

+    return e.use_exact_insn (
+      code_for_pred_sf_vfnrclip_x_f_qf (UNSPEC, e.vector_mode ()));
+    gcc_unreachable ();


Suggested change

return e.use_exact_insn (

code_for_pred_sf_vfnrclip_x_f_qf (UNSPEC, e.vector_mode ()));

gcc_unreachable ();

return e.use_exact_insn (

code_for_pred_sf_vfnrclip_x_f_qf (UNSPEC, e.vector_mode ()));

kito-cheng · 2024-11-21T10:19:48Z

gcc/config/riscv/riscv-vector-builtins-bases.cc

@@ -869,6 +918,105 @@ class vwmaccus : public function_base
  }
 };

+/* Implements vqmacc.  */
+class vqmacc : public function_base


Suggested change

class vqmacc : public function_base

class sf_vqmacc : public function_base

kito-cheng · 2024-11-21T10:20:01Z

gcc/config/riscv/riscv-vector-builtins-bases.cc

+};
+
+/* Implements vqmaccu.  */
+class vqmaccu : public function_base


Suggested change

class vqmaccu : public function_base

class sf_vqmaccu : public function_base

kito-cheng · 2024-11-21T10:20:35Z

gcc/config/riscv/riscv-vector-builtins-bases.cc

+};
+
+/* Implements vqmaccus.  */
+class vqmaccus : public function_base


Suggested change

class vqmaccus : public function_base

class sf_vqmaccus : public function_base

kito-cheng · 2024-11-21T10:22:12Z

gcc/config/riscv/riscv-vector-builtins-bases.h

+extern const function_base *const sf_vqmacc;
+extern const function_base *const sf_vqmaccu;
+extern const function_base *const sf_vqmaccsu;
+extern const function_base *const sf_vqmaccus;


Could you create a new file sifive-vector-builtins-bases.h to hold sifive intrinsic

kito-cheng · 2024-11-21T10:22:57Z

gcc/config/riscv/riscv-vector-builtins-bases.cc

+/* Implements vfnrclip.  */
+template <int UNSPEC, enum frm_op_type FRM_OP = NO_FRM>
+class vfnrclip_x_f_qf : public function_base


create sifive-vector-builtins-bases.cc to hold those intrinsic

kito-cheng · 2024-11-21T10:23:54Z

gcc/config/riscv/riscv.md

@@ -390,6 +390,7 @@
 ;; vsmul       vector single-width fractional multiply with rounding and saturation instructions
 ;; vsshift     vector single-width scaling shift instructions
 ;; vnclip      vector narrowing fixed-point clip instructions
+;; vfnrclip     vector fp32 to int8 ranged clip instructions


They are not standard instruction, so plz create a new section in the comment rather than mixed in the std section, also plz prefixed with sf_

kito-cheng · 2024-11-21T10:24:01Z

gcc/config/riscv/riscv.md

@@ -475,6 +476,7 @@
 ;; vfncvtbf16  vector narrowing single floating-point to brain floating-point instruction
 ;; vfwcvtbf16  vector widening brain floating-point to single floating-point instruction
 ;; vfwmaccbf16  vector BF16 widening multiply-accumulate
+;; vqmacc      vector matrix integer multiply-add instructions


yulong18 · 2024-11-22T09:40:12Z

Hi, @kito-cheng and @pz9115 :
We add the new commits for vqmacc.
Please review it. If there are no comment, We will send patch to upstream.

yulong18 · 2024-11-26T01:00:03Z

Hi, @kito-cheng :
We have rebased the code. Please help review the vqmacc code. Thanks.
We want send the vqmacc patchset to upstream in few days.

kito-cheng reviewed Oct 1, 2024

View reviewed changes

kito-cheng reviewed Oct 25, 2024

View reviewed changes

yulong18 force-pushed the sifive-rvv-intrinsic-temp branch from 22a7f1e to 584fce9 Compare November 4, 2024 10:56

kito-cheng reviewed Nov 4, 2024

View reviewed changes

kito-cheng reviewed Nov 5, 2024

View reviewed changes

yulong18 force-pushed the sifive-rvv-intrinsic-temp branch from 584fce9 to da24b1c Compare November 21, 2024 07:00

yulong18 force-pushed the sifive-rvv-intrinsic-temp branch from da24b1c to 16f13a0 Compare November 21, 2024 07:23

pz9115 requested a review from kito-cheng November 21, 2024 08:43

kito-cheng reviewed Nov 21, 2024

View reviewed changes

yulong18 added 2 commits November 22, 2024 16:46

RISC-V: Add intrinsics support for SiFive Xsfvqmaccqoq/dod extensions.

f053bad

RISC-V: Add intrinsics testcases for SiFive Xsfvqmaccqoq/dod extensions.

1f32394

yulong18 force-pushed the sifive-rvv-intrinsic-temp branch from 2c9c076 to 1f32394 Compare November 22, 2024 09:35

yulong18 added 2 commits November 22, 2024 18:29

RISC-V: Add intrinsics support for SiFive Xsfvfnrclipxfqf extensions.

e837693

RISC-V: Add intrinsics testcases for SiFive Xsfvfnrclipxfqf extensions.

124e727

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Sifive Xsfvqmacc extension intrinsics #5

Add Sifive Xsfvqmacc extension intrinsics #5

yulong18 commented Sep 29, 2024

yulong18 commented Sep 29, 2024

kito-cheng Sep 30, 2024

kito-cheng Sep 30, 2024

kito-cheng Sep 30, 2024

kito-cheng left a comment

kito-cheng Oct 25, 2024

kito-cheng Oct 25, 2024

kito-cheng Oct 25, 2024

kito-cheng Oct 25, 2024

kito-cheng Oct 25, 2024

kito-cheng Oct 25, 2024

kito-cheng Oct 25, 2024

kito-cheng Nov 4, 2024

kito-cheng Nov 4, 2024

kito-cheng Nov 5, 2024

kito-cheng Nov 5, 2024

kito-cheng Nov 5, 2024

kito-cheng Nov 5, 2024

yulong18 commented Nov 21, 2024

kito-cheng Nov 21, 2024

kito-cheng Nov 21, 2024

kito-cheng Nov 21, 2024

kito-cheng Nov 21, 2024

kito-cheng Nov 21, 2024

kito-cheng Nov 21, 2024

kito-cheng Nov 21, 2024

kito-cheng Nov 21, 2024

kito-cheng Nov 21, 2024

kito-cheng Nov 21, 2024

yulong18 commented Nov 22, 2024

yulong18 commented Nov 26, 2024

		@@ -215,6 +215,7 @@ riscv_cpu_cpp_builtins (cpp_reader *pfile)
		builtin_define_with_int_value ("__riscv_th_v_intrinsic",
		riscv_ext_version_value (0, 11));

		return e.use_exact_insn (
		code_for_pred_fnr_clip (ZERO_EXTEND, e.vector_mode ()));

		static CONSTEXPR const vfnrclip x_obj;
		static CONSTEXPR const vfnrclip xu_obj;

		/* vop_v --> vop_v_<type>. */
		b.append_name (type_suffixes[instance.type.index].vector);

		if (overloaded_p && (instance.pred == PRED_TYPE_tu \|\| instance.pred == PRED_TYPE_mu \|\|
		instance.pred == PRED_TYPE_tumu))

		/* Return true if intrinsics maybe require qfrm operand. */
		virtual bool may_require_qfrm_p () const;

		/* According to SIFIVE vector-intrinsic-doc, it adds suffixes
		for vop_m C++ overloaded API.*/

	class vqmacc : public function_base
	class sf_vqmacc : public function_base

	class vqmaccu : public function_base
	class sf_vqmaccu : public function_base

	class vqmaccus : public function_base
	class sf_vqmaccus : public function_base

Add Sifive Xsfvqmacc extension intrinsics #5

Are you sure you want to change the base?

Add Sifive Xsfvqmacc extension intrinsics #5

Conversation

yulong18 commented Sep 29, 2024

yulong18 commented Sep 29, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kito-cheng left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yulong18 commented Nov 21, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yulong18 commented Nov 22, 2024

yulong18 commented Nov 26, 2024