Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Sifive Xsfvqmacc extension intrinsics #5

Open
wants to merge 4 commits into
base: sifive-rvv-intrinsic
Choose a base branch
from

Conversation

yulong18
Copy link
Collaborator

No description provided.

@yulong18
Copy link
Collaborator Author

Hi, @kito-cheng and @monkchiang :
I create a new PR. I create a new temporary branch(sifive-rvv-intrinsic-temp),but I will delete it later.
I put the test case in the comment.
`#include "riscv_vector.h"

vint32m1_t test1(vint32m1_t vd, vint8m1_t vs1, vint8mf2_t vs2, size_t vl) {
return __riscv_sf_vqmacc_4x8x4_i32m1(vd, vs1, vs2, vl);
}`

@@ -1750,6 +1752,9 @@ static const riscv_ext_flag_table_t riscv_ext_flag_table[] =

{"xsfvcp", &gcc_options::x_riscv_sifive_subext, MASK_XSFVCP},
{"xsfcease", &gcc_options::x_riscv_sifive_subext, MASK_XSFCEASE},
{"xsfvqmaccqoq", &gcc_options::x_riscv_sifive_subext, MASK_XSFVQMACCQOQ},
{"xsfvqmaccdod", &gcc_options::x_riscv_sifive_subext, MASK_XSFVQMACCDOD},

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop this blank line

@@ -215,6 +215,7 @@ riscv_cpu_cpp_builtins (cpp_reader *pfile)
builtin_define_with_int_value ("__riscv_th_v_intrinsic",
riscv_ext_version_value (0, 11));


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop this blank line

@@ -1321,4 +1354,5 @@ SHAPE(seg_fault_load, seg_fault_load)
SHAPE(crypto_vv, crypto_vv)
SHAPE(crypto_vi, crypto_vi)
SHAPE(crypto_vv_no_op_type, crypto_vv_no_op_type)
SHAPE(sf_vqmacc,sf_vqmacc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
SHAPE(sf_vqmacc,sf_vqmacc)
SHAPE(sf_vqmacc, sf_vqmacc)

Copy link
Member

@kito-cheng kito-cheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add testcase?

Also you could copy /contrib/clang-format to /.clang-format, then you can use clang-format or git clang-format to format your code

return e.use_exact_insn (
code_for_pred_fnr_clip (ZERO_EXTEND, e.vector_mode ()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to distinguish between x and xu here?

Comment on lines 728 to 754
gcc_unreachable ();
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
gcc_unreachable ();
}
}
}
gcc_unreachable ();
}

gcc_unreachable in wrong level I think?

Comment on lines 2685 to 2686
static CONSTEXPR const vfnrclip x_obj;
static CONSTEXPR const vfnrclip xu_obj;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sf_vfnrclip_x_obj;
sf_vfnrclip_xu_obj;

Comment on lines 975 to 977
if (e.op_info->op == OP_TYPE_4x8x4)
return e.use_widen_ternop_insn (
code_for_pred_quad_mul_plusus_qoq (e.vector_mode ()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent

Comment on lines 2876 to 2879
static CONSTEXPR const vqmacc vqmacc_obj;
static CONSTEXPR const vqmaccu vqmaccu_obj;
static CONSTEXPR const vqmaccsu vqmaccsu_obj;
static CONSTEXPR const vqmaccsu vqmaccus_obj;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add sf_ prefix

Comment on lines 1341 to 1339
/* vop_v --> vop_v_<type>. */
b.append_name (type_suffixes[instance.type.index].vector);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent

Comment on lines 1351 to 1349
if (overloaded_p && (instance.pred == PRED_TYPE_tu || instance.pred == PRED_TYPE_mu ||
instance.pred == PRED_TYPE_tumu))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent

yulong18 pushed a commit that referenced this pull request Nov 4, 2024
This patch folds svindex with constant arguments into a vector series.
We implemented this in svindex_impl::fold using the function build_vec_series.
For example,
svuint64_t f1 ()
{
  return svindex_u642 (10, 3);
}
compiled with -O2 -march=armv8.2-a+sve, is folded to {10, 13, 16, ...}
in the gimple pass lower.
This optimization benefits cases where svindex is used in combination with
other gimple-level optimizations.
For example,
svuint64_t f2 ()
{
    return svmul_x (svptrue_b64 (), svindex_u64 (10, 3), 5);
}
has previously been compiled to
f2:
        index   z0.d, riscvarchive#10, #3
        mul     z0.d, z0.d, #5
        ret
Now, it is compiled to
f2:
        mov     x0, 50
        index   z0.d, x0, riscvarchive#15
        ret

We added test cases checking
- the application of the transform during gimple for constant arguments,
- the interaction with another gimple-level optimization.

The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?

Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>

gcc/
	* config/aarch64/aarch64-sve-builtins-base.cc
	(svindex_impl::fold): Add constant folding.

gcc/testsuite/
	* gcc.target/aarch64/sve/index_const_fold.c: New test.
yulong18 pushed a commit that referenced this pull request Nov 4, 2024
We can make use of the integrated rotate step of the XAR instruction
to implement most vector integer rotates, as long we zero out one
of the input registers for it.  This allows for a lower-latency sequence
than the fallback SHL+USRA, especially when we can hoist the zeroing operation
away from loops and hot parts.  This should be safe to do for 64-bit vectors
as well even though the XAR instructions operate on 128-bit values, as the
bottom 64-bit results is later accessed through the right subregs.

This strategy is used whenever we have XAR instructions, the logic
in aarch64_emit_opt_vec_rotate is adjusted to resort to
expand_rotate_as_vec_perm only when it's expected to generate a single REV*
instruction or when XAR instructions are not present.

With this patch we can gerate for the input:
v4si
G1 (v4si r)
{
    return (r >> 23) | (r << 9);
}

v8qi
G2 (v8qi r)
{
  return (r << 3) | (r >> 5);
}
the assembly for +sve2:
G1:
        movi    v31.4s, 0
        xar     z0.s, z0.s, z31.s, riscvarchive#23
        ret

G2:
        movi    v31.4s, 0
        xar     z0.b, z0.b, z31.b, #5
        ret

instead of the current:
G1:
        shl     v31.4s, v0.4s, 9
        usra    v31.4s, v0.4s, 23
        mov     v0.16b, v31.16b
        ret
G2:
        shl     v31.8b, v0.8b, 3
        usra    v31.8b, v0.8b, 5
        mov     v0.8b, v31.8b
        ret

Bootstrapped and tested on aarch64-none-linux-gnu.

Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com>

gcc/

	* config/aarch64/aarch64.cc (aarch64_emit_opt_vec_rotate): Add
	generation of XAR sequences when possible.

gcc/testsuite/

	* gcc.target/aarch64/rotate_xar_1.c: New test.
@yulong18 yulong18 force-pushed the sifive-rvv-intrinsic-temp branch from 22a7f1e to 584fce9 Compare November 4, 2024 10:56
Comment on lines 580 to 582
/* Return true if intrinsics maybe require qfrm operand. */
virtual bool may_require_qfrm_p () const;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's qfrm ?

Comment on lines 859 to 866
/* We choose to return false by default since most of the intrinsics does
not need qfrm operand. */
inline bool
function_base::may_require_qfrm_p () const
{
return false;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Comment on lines 1 to 6
#include "riscv_vector.h"

vint8mf8_t test1(float vs1, vfloat32mf2_t vs2, size_t vl) {
return __riscv_sf_vfnrclip_x_f_qf_i8mf8(vs2, vs1, vl);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you reference this file https://github.com/gcc-mirror/gcc/blob/master/gcc/testsuite/gcc.target/riscv/target-attr-01.c and add correct test directive?
e.g.

/* { dg-do compile } */
/* { dg-skip-if "" { *-*-* } { "-flto" } { "" } } */
/* { dg-options "-march=rv64gcv_xsfvfnrclipxfqf -O2 -mabi=lp64d" } */
/* { dg-final { check-function-bodies "**" "" } } */

and add some check within the testcase like:

/*
** foo:
**   ...
**   vsetvli\s*x0, a0, e32, m1, ta, ta
**   sf.vfnrclip.x.f.qf\s*fa0,v8
**   ...
*/

And don't forgot sf_vqmacc.c

Comment on lines 19 to 21
vint8m2_t test2(float vs1, vfloat32m8_t vs2, size_t vl) {
return __riscv_sf_vfnrclip_x_f_qf_i8m2(vs2, vs1, vl);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also testcase for __riscv_sf_vfnrclip_xu_f_qf_*, not really need all combination but please add few to improve the coverage

@@ -381,6 +381,7 @@
;; vidiv vector single-width integer divide instructions
;; viwmul vector widening integer multiply instructions
;; vimuladd vector single-width integer multiply-add instructions
;; vsfmuladd vector matrix integer multiply-add instructions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment, please put this in the end of the comment, don't mixed with standard instruction

@@ -390,6 +391,7 @@
;; vsmul vector single-width fractional multiply with rounding and saturation instructions
;; vsshift vector single-width scaling shift instructions
;; vnclip vector narrowing fixed-point clip instructions
;; vsfclip vector fp32 to int8 ranged clip instructions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

@yulong18 yulong18 force-pushed the sifive-rvv-intrinsic-temp branch from 584fce9 to da24b1c Compare November 21, 2024 07:00
@yulong18
Copy link
Collaborator Author

Hi, @kito-cheng :
There are two new commits that add the support for vqmacc. I tested it and all testcases passed. Please review it. Thanks!
Today, I also will add the code that support vfnrclipxfqf and vcix extensions for this PR.

@yulong18 yulong18 force-pushed the sifive-rvv-intrinsic-temp branch from da24b1c to 16f13a0 Compare November 21, 2024 07:23
@pz9115 pz9115 requested a review from kito-cheng November 21, 2024 08:43
Comment on lines 1339 to 1340
/* According to SIFIVE vector-intrinsic-doc, it adds suffixes
for vop_m C++ overloaded API.*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comment seems not updated?

Comment on lines 751 to 753
return e.use_exact_insn (
code_for_pred_sf_vfnrclip_x_f_qf (UNSPEC, e.vector_mode ()));
gcc_unreachable ();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return e.use_exact_insn (
code_for_pred_sf_vfnrclip_x_f_qf (UNSPEC, e.vector_mode ()));
gcc_unreachable ();
return e.use_exact_insn (
code_for_pred_sf_vfnrclip_x_f_qf (UNSPEC, e.vector_mode ()));

Comment on lines 727 to 729
return e.use_exact_insn (
code_for_pred_sf_vfnrclip_x_f_qf (UNSPEC, e.vector_mode ()));
gcc_unreachable ();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return e.use_exact_insn (
code_for_pred_sf_vfnrclip_x_f_qf (UNSPEC, e.vector_mode ()));
gcc_unreachable ();
return e.use_exact_insn (
code_for_pred_sf_vfnrclip_x_f_qf (UNSPEC, e.vector_mode ()));

@@ -869,6 +918,105 @@ class vwmaccus : public function_base
}
};

/* Implements vqmacc. */
class vqmacc : public function_base
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class vqmacc : public function_base
class sf_vqmacc : public function_base

};

/* Implements vqmaccu. */
class vqmaccu : public function_base
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class vqmaccu : public function_base
class sf_vqmaccu : public function_base

};

/* Implements vqmaccus. */
class vqmaccus : public function_base
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class vqmaccus : public function_base
class sf_vqmaccus : public function_base

Comment on lines 112 to 115
extern const function_base *const sf_vqmacc;
extern const function_base *const sf_vqmaccu;
extern const function_base *const sf_vqmaccsu;
extern const function_base *const sf_vqmaccus;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you create a new file sifive-vector-builtins-bases.h to hold sifive intrinsic

Comment on lines 708 to 710
/* Implements vfnrclip. */
template <int UNSPEC, enum frm_op_type FRM_OP = NO_FRM>
class vfnrclip_x_f_qf : public function_base
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create sifive-vector-builtins-bases.cc to hold those intrinsic

@@ -390,6 +390,7 @@
;; vsmul vector single-width fractional multiply with rounding and saturation instructions
;; vsshift vector single-width scaling shift instructions
;; vnclip vector narrowing fixed-point clip instructions
;; vfnrclip vector fp32 to int8 ranged clip instructions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are not standard instruction, so plz create a new section in the comment rather than mixed in the std section, also plz prefixed with sf_

@@ -475,6 +476,7 @@
;; vfncvtbf16 vector narrowing single floating-point to brain floating-point instruction
;; vfwcvtbf16 vector widening brain floating-point to single floating-point instruction
;; vfwmaccbf16 vector BF16 widening multiply-accumulate
;; vqmacc vector matrix integer multiply-add instructions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

@yulong18 yulong18 force-pushed the sifive-rvv-intrinsic-temp branch from 2c9c076 to 1f32394 Compare November 22, 2024 09:35
@yulong18
Copy link
Collaborator Author

Hi, @kito-cheng and @pz9115 :
We add the new commits for vqmacc.
Please review it. If there are no comment, We will send patch to upstream.

@yulong18
Copy link
Collaborator Author

Hi, @kito-cheng :
We have rebased the code. Please help review the vqmacc code. Thanks.
We want send the vqmacc patchset to upstream in few days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants