-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Sifive Xsfvqmacc extension intrinsics #5
base: sifive-rvv-intrinsic
Are you sure you want to change the base?
Conversation
Hi, @kito-cheng and @monkchiang : vint32m1_t test1(vint32m1_t vd, vint8m1_t vs1, vint8mf2_t vs2, size_t vl) { |
@@ -1750,6 +1752,9 @@ static const riscv_ext_flag_table_t riscv_ext_flag_table[] = | |||
|
|||
{"xsfvcp", &gcc_options::x_riscv_sifive_subext, MASK_XSFVCP}, | |||
{"xsfcease", &gcc_options::x_riscv_sifive_subext, MASK_XSFCEASE}, | |||
{"xsfvqmaccqoq", &gcc_options::x_riscv_sifive_subext, MASK_XSFVQMACCQOQ}, | |||
{"xsfvqmaccdod", &gcc_options::x_riscv_sifive_subext, MASK_XSFVQMACCDOD}, | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drop this blank line
gcc/config/riscv/riscv-c.cc
Outdated
@@ -215,6 +215,7 @@ riscv_cpu_cpp_builtins (cpp_reader *pfile) | |||
builtin_define_with_int_value ("__riscv_th_v_intrinsic", | |||
riscv_ext_version_value (0, 11)); | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drop this blank line
@@ -1321,4 +1354,5 @@ SHAPE(seg_fault_load, seg_fault_load) | |||
SHAPE(crypto_vv, crypto_vv) | |||
SHAPE(crypto_vi, crypto_vi) | |||
SHAPE(crypto_vv_no_op_type, crypto_vv_no_op_type) | |||
SHAPE(sf_vqmacc,sf_vqmacc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SHAPE(sf_vqmacc,sf_vqmacc) | |
SHAPE(sf_vqmacc, sf_vqmacc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add testcase?
Also you could copy /contrib/clang-format to /.clang-format, then you can use clang-format
or git clang-format
to format your code
return e.use_exact_insn ( | ||
code_for_pred_fnr_clip (ZERO_EXTEND, e.vector_mode ())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How to distinguish between x and xu here?
gcc_unreachable (); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gcc_unreachable (); | |
} | |
} | |
} | |
gcc_unreachable (); | |
} |
gcc_unreachable in wrong level I think?
static CONSTEXPR const vfnrclip x_obj; | ||
static CONSTEXPR const vfnrclip xu_obj; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sf_vfnrclip_x_obj;
sf_vfnrclip_xu_obj;
if (e.op_info->op == OP_TYPE_4x8x4) | ||
return e.use_widen_ternop_insn ( | ||
code_for_pred_quad_mul_plusus_qoq (e.vector_mode ())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indent
static CONSTEXPR const vqmacc vqmacc_obj; | ||
static CONSTEXPR const vqmaccu vqmaccu_obj; | ||
static CONSTEXPR const vqmaccsu vqmaccsu_obj; | ||
static CONSTEXPR const vqmaccsu vqmaccus_obj; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add sf_ prefix
/* vop_v --> vop_v_<type>. */ | ||
b.append_name (type_suffixes[instance.type.index].vector); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indent
if (overloaded_p && (instance.pred == PRED_TYPE_tu || instance.pred == PRED_TYPE_mu || | ||
instance.pred == PRED_TYPE_tumu)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indent
This patch folds svindex with constant arguments into a vector series. We implemented this in svindex_impl::fold using the function build_vec_series. For example, svuint64_t f1 () { return svindex_u642 (10, 3); } compiled with -O2 -march=armv8.2-a+sve, is folded to {10, 13, 16, ...} in the gimple pass lower. This optimization benefits cases where svindex is used in combination with other gimple-level optimizations. For example, svuint64_t f2 () { return svmul_x (svptrue_b64 (), svindex_u64 (10, 3), 5); } has previously been compiled to f2: index z0.d, riscvarchive#10, #3 mul z0.d, z0.d, #5 ret Now, it is compiled to f2: mov x0, 50 index z0.d, x0, riscvarchive#15 ret We added test cases checking - the application of the transform during gimple for constant arguments, - the interaction with another gimple-level optimization. The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. OK for mainline? Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com> gcc/ * config/aarch64/aarch64-sve-builtins-base.cc (svindex_impl::fold): Add constant folding. gcc/testsuite/ * gcc.target/aarch64/sve/index_const_fold.c: New test.
We can make use of the integrated rotate step of the XAR instruction to implement most vector integer rotates, as long we zero out one of the input registers for it. This allows for a lower-latency sequence than the fallback SHL+USRA, especially when we can hoist the zeroing operation away from loops and hot parts. This should be safe to do for 64-bit vectors as well even though the XAR instructions operate on 128-bit values, as the bottom 64-bit results is later accessed through the right subregs. This strategy is used whenever we have XAR instructions, the logic in aarch64_emit_opt_vec_rotate is adjusted to resort to expand_rotate_as_vec_perm only when it's expected to generate a single REV* instruction or when XAR instructions are not present. With this patch we can gerate for the input: v4si G1 (v4si r) { return (r >> 23) | (r << 9); } v8qi G2 (v8qi r) { return (r << 3) | (r >> 5); } the assembly for +sve2: G1: movi v31.4s, 0 xar z0.s, z0.s, z31.s, riscvarchive#23 ret G2: movi v31.4s, 0 xar z0.b, z0.b, z31.b, #5 ret instead of the current: G1: shl v31.4s, v0.4s, 9 usra v31.4s, v0.4s, 23 mov v0.16b, v31.16b ret G2: shl v31.8b, v0.8b, 3 usra v31.8b, v0.8b, 5 mov v0.8b, v31.8b ret Bootstrapped and tested on aarch64-none-linux-gnu. Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com> gcc/ * config/aarch64/aarch64.cc (aarch64_emit_opt_vec_rotate): Add generation of XAR sequences when possible. gcc/testsuite/ * gcc.target/aarch64/rotate_xar_1.c: New test.
22a7f1e
to
584fce9
Compare
/* Return true if intrinsics maybe require qfrm operand. */ | ||
virtual bool may_require_qfrm_p () const; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's qfrm ?
/* We choose to return false by default since most of the intrinsics does | ||
not need qfrm operand. */ | ||
inline bool | ||
function_base::may_require_qfrm_p () const | ||
{ | ||
return false; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
#include "riscv_vector.h" | ||
|
||
vint8mf8_t test1(float vs1, vfloat32mf2_t vs2, size_t vl) { | ||
return __riscv_sf_vfnrclip_x_f_qf_i8mf8(vs2, vs1, vl); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you reference this file https://github.com/gcc-mirror/gcc/blob/master/gcc/testsuite/gcc.target/riscv/target-attr-01.c and add correct test directive?
e.g.
/* { dg-do compile } */
/* { dg-skip-if "" { *-*-* } { "-flto" } { "" } } */
/* { dg-options "-march=rv64gcv_xsfvfnrclipxfqf -O2 -mabi=lp64d" } */
/* { dg-final { check-function-bodies "**" "" } } */
and add some check within the testcase like:
/*
** foo:
** ...
** vsetvli\s*x0, a0, e32, m1, ta, ta
** sf.vfnrclip.x.f.qf\s*fa0,v8
** ...
*/
And don't forgot sf_vqmacc.c
vint8m2_t test2(float vs1, vfloat32m8_t vs2, size_t vl) { | ||
return __riscv_sf_vfnrclip_x_f_qf_i8m2(vs2, vs1, vl); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also testcase for __riscv_sf_vfnrclip_xu_f_qf_*
, not really need all combination but please add few to improve the coverage
gcc/config/riscv/riscv.md
Outdated
@@ -381,6 +381,7 @@ | |||
;; vidiv vector single-width integer divide instructions | |||
;; viwmul vector widening integer multiply instructions | |||
;; vimuladd vector single-width integer multiply-add instructions | |||
;; vsfmuladd vector matrix integer multiply-add instructions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comment, please put this in the end of the comment, don't mixed with standard instruction
gcc/config/riscv/riscv.md
Outdated
@@ -390,6 +391,7 @@ | |||
;; vsmul vector single-width fractional multiply with rounding and saturation instructions | |||
;; vsshift vector single-width scaling shift instructions | |||
;; vnclip vector narrowing fixed-point clip instructions | |||
;; vsfclip vector fp32 to int8 ranged clip instructions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
584fce9
to
da24b1c
Compare
Hi, @kito-cheng : |
da24b1c
to
16f13a0
Compare
/* According to SIFIVE vector-intrinsic-doc, it adds suffixes | ||
for vop_m C++ overloaded API.*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the comment seems not updated?
return e.use_exact_insn ( | ||
code_for_pred_sf_vfnrclip_x_f_qf (UNSPEC, e.vector_mode ())); | ||
gcc_unreachable (); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return e.use_exact_insn ( | |
code_for_pred_sf_vfnrclip_x_f_qf (UNSPEC, e.vector_mode ())); | |
gcc_unreachable (); | |
return e.use_exact_insn ( | |
code_for_pred_sf_vfnrclip_x_f_qf (UNSPEC, e.vector_mode ())); |
return e.use_exact_insn ( | ||
code_for_pred_sf_vfnrclip_x_f_qf (UNSPEC, e.vector_mode ())); | ||
gcc_unreachable (); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return e.use_exact_insn ( | |
code_for_pred_sf_vfnrclip_x_f_qf (UNSPEC, e.vector_mode ())); | |
gcc_unreachable (); | |
return e.use_exact_insn ( | |
code_for_pred_sf_vfnrclip_x_f_qf (UNSPEC, e.vector_mode ())); |
@@ -869,6 +918,105 @@ class vwmaccus : public function_base | |||
} | |||
}; | |||
|
|||
/* Implements vqmacc. */ | |||
class vqmacc : public function_base |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class vqmacc : public function_base | |
class sf_vqmacc : public function_base |
}; | ||
|
||
/* Implements vqmaccu. */ | ||
class vqmaccu : public function_base |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class vqmaccu : public function_base | |
class sf_vqmaccu : public function_base |
}; | ||
|
||
/* Implements vqmaccus. */ | ||
class vqmaccus : public function_base |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class vqmaccus : public function_base | |
class sf_vqmaccus : public function_base |
extern const function_base *const sf_vqmacc; | ||
extern const function_base *const sf_vqmaccu; | ||
extern const function_base *const sf_vqmaccsu; | ||
extern const function_base *const sf_vqmaccus; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you create a new file sifive-vector-builtins-bases.h
to hold sifive intrinsic
/* Implements vfnrclip. */ | ||
template <int UNSPEC, enum frm_op_type FRM_OP = NO_FRM> | ||
class vfnrclip_x_f_qf : public function_base |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
create sifive-vector-builtins-bases.cc
to hold those intrinsic
gcc/config/riscv/riscv.md
Outdated
@@ -390,6 +390,7 @@ | |||
;; vsmul vector single-width fractional multiply with rounding and saturation instructions | |||
;; vsshift vector single-width scaling shift instructions | |||
;; vnclip vector narrowing fixed-point clip instructions | |||
;; vfnrclip vector fp32 to int8 ranged clip instructions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are not standard instruction, so plz create a new section in the comment rather than mixed in the std section, also plz prefixed with sf_
gcc/config/riscv/riscv.md
Outdated
@@ -475,6 +476,7 @@ | |||
;; vfncvtbf16 vector narrowing single floating-point to brain floating-point instruction | |||
;; vfwcvtbf16 vector widening brain floating-point to single floating-point instruction | |||
;; vfwmaccbf16 vector BF16 widening multiply-accumulate | |||
;; vqmacc vector matrix integer multiply-add instructions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
2c9c076
to
1f32394
Compare
Hi, @kito-cheng and @pz9115 : |
Hi, @kito-cheng : |
No description provided.