Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Set up based on sse2neon #1

Merged
merged 6 commits into from
Nov 1, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions .clang-format
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
ColumnLimit: 80
BraceWrapping:
AfterFunction: true
AfterNamespace: true
AfterStruct: true
AfterClass: true
AfterControlStatement: true
AfterEnum: true
AfterUnion: true
AfterExternBlock: true
SplitEmptyFunction: false
SplitEmptyRecord: false
45 changes: 45 additions & 0 deletions .github/workflows/github_actions.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
name: Github Actions

on:
push:
branches: [main]
pull_request:
branches: [main]

jobs:
cross_compile_tests:
runs-on: ubuntu-20.04
steps:
- name: checkout code
uses: actions/checkout@v3.2.0
- name: setup riscv toolchain
run: |
mkdir /opt/riscv
export PATH=$PATH:/opt/riscv/bin
wget https://github.com/riscv-collab/riscv-gnu-toolchain/releases/download/2023.10.18/riscv64-elf-ubuntu-20.04-gcc-nightly-2023.10.18-nightly.tar.gz
sudo tar -xzf riscv64-elf-ubuntu-20.04-gcc-nightly-2023.10.18-nightly.tar.gz -C /opt/

- name: run tests
run: |
export PATH=$PATH:/opt/riscv/bin
sh scripts/cross-test.sh qemu

check_test_cases:
runs-on: ubuntu-20.04
steps:
- name: checkout code
uses: actions/checkout@v3.2.0
- name: build artifact
run: |
make test

coding_style:
runs-on: ubuntu-20.04
steps:
- name: checkout code
uses: actions/checkout@v3.2.0
- name: style check
run: |
sudo apt-get install -q -y clang-format-12
sh scripts/check-format.sh
shell: bash
9 changes: 9 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
*.exe
*.o
*.gch
tests/*.d
tests/main
.vs/
Debug/
Release/
*.log
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
MIT License

Copyright (c) 2023 Yang Hau
Copyright (c) 2023 SSE2RVV Contributors

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
89 changes: 89 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
ifndef CC
override CC = gcc
endif

ifndef CXX
override CXX = g++
endif

ifndef CROSS_COMPILE
processor := $(shell uname -m)
else # CROSS_COMPILE was set
CC = $(CROSS_COMPILE)gcc
CXX = $(CROSS_COMPILE)g++
CXXFLAGS += -static
LDFLAGS += -static

check_riscv := $(shell echo | $(CROSS_COMPILE)cpp -dM - | grep " __riscv_xlen " | cut -c22-)
uname_result := $(shell uname -m)
ifeq ($(check_riscv),64)
processor = rv64
else ifeq ($(uname_result),rv64imafdc)
processor = rv64
else ifeq ($(check_riscv),32)
processor = rv32
else ifeq ($(uname_result),rv32i)
processor = rv32
else
$(error Unsupported cross-compiler)
endif

ifeq ($(processor),$(filter $(processor),i386 x86_64))
ARCH_CFLAGS = -maes -mpclmul -mssse3 -msse4.2
else
ARCH_CFLAGS = -march=$(processor)gcv_zba
endif

ifeq ($(SIMULATOR_TYPE), qemu)
SIMULATOR += qemu-riscv64
SIMULATOR_FLAGS = -cpu $(processor),v=true,zba=true,vlen=128
else
SIMULATOR = spike
SIMULATOR_FLAGS = --isa=$(processor)gcv_zba
PROXY_KERNEL = pk
endif
endif

CXXFLAGS += -Wall -Wcast-qual -I. $(ARCH_CFLAGS)
LDFLAGS += -lm
OBJS = \
tests/binding.o \
tests/common.o \
tests/impl.o \
tests/main.o
deps := $(OBJS:%.o=%.o.d)

.SUFFIXES: .o .cpp
.cpp.o:
$(CXX) -o $@ $(CXXFLAGS) -c -MMD -MF $@.d $<

EXEC = tests/main

$(EXEC): $(OBJS)
$(CXX) $(LDFLAGS) -o $@ $^

test: tests/main
ifeq ($(processor),$(filter $(processor),rv32 rv64))
$(CC) $(ARCH_CFLAGS) -c sse2rvv.h
endif
$(SIMULATOR) $(SIMULATOR_FLAGS) $(PROXY_KERNEL) $^

build-test: tests/main
ifeq ($(processor),$(filter $(processor),rv32 rv64))
$(CC) $(ARCH_CFLAGS) -c sse2rvv.h
endif

format:
@echo "Formatting files with clang-format.."
@if ! hash clang-format; then echo "clang-format is required to indent"; fi
clang-format -i sse2rvv.h tests/*.cpp tests/*.h

.PHONY: clean check format

clean:
$(RM) $(OBJS) $(EXEC) $(deps) sse2rvv.h.gch

clean-all: clean
$(RM) *.log

-include $(deps)
94 changes: 94 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
# sse2rvv

A C/C++ header file that converts Intel SSE intrinsics to RISCV-V Extension intrinsics.

## Introduction

`sse2rvv` is a translator of Intel SSE (Streaming SIMD Extensions) intrinsics
to [RISCV-V Extension](https://github.com/riscv/riscv-v-spec),
shortening the time needed to get an RISCV working program that then can be used to
extract profiles and to identify hot paths in the code.
The header file `sse2rvv.h` contains several of the functions provided by Intel
intrinsic headers such as `<xmmintrin.h>`, only implemented with RISCV-based counterparts
to produce the exact semantics of the intrinsics.

This project is based on [sse2neon](https://github.com/DLTcollab/sse2neon), and modify it to RISCV version.

## Mapping and Coverage

Header file | Extension |
---|---|
`<mmintrin.h>` | MMX |
`<xmmintrin.h>` | SSE |
`<emmintrin.h>` | SSE2 |
`<pmmintrin.h>` | SSE3 |
`<tmmintrin.h>` | SSSE3 |
`<smmintrin.h>` | SSE4.1 |
`<nmmintrin.h>` | SSE4.2 |
`<wmmintrin.h>` | AES |

`sse2rvv` aims to support SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2 and AES extension.

In order to deliver RVV-equivalent intrinsics for all SSE intrinsics used widely,
please be aware that some SSE intrinsics exist a direct mapping with a concrete
NEON-equivalent intrinsic. Others, unfortunately, lack a 1:1 mapping, meaning that
their equivalents are built utilizing a number of NEON intrinsics.

For example, SSE intrinsic `_mm_loadu_si128` has a direct RVV mapping (`vld1q_s32`),
but SSE intrinsic `_mm_maddubs_epi16` has to be implemented with multiple RVV instructions.

### Floating-point compatibility

Some conversions require several RVV intrinsics, which may produce inconsistent results
compared to their SSE counterparts due to differences in the arithmetic rules of IEEE-754.

## Usage

- Put the file `sse2rvv.h` in to your source code directory.

- Locate the following SSE header files included in the code:
```C
#include <xmmintrin.h>
#include <emmintrin.h>
```
{p,t,s,n,w}mmintrin.h could be replaceable as well.

- Replace them with:
```C
#include "sse2rvv.h"
```

- Explicitly specify platform-specific options to gcc/clang compilers.
* On riscv64
```shell
-march=r64gcv_zba
```

## Run Built-in Test Suite

`sse2rvv` provides a unified interface for developing test cases. These test
cases are located in `tests` directory, and the input data is specified at
runtime. Use the following commands to perform test cases:
```shell
$ make test
```

## Reference
* [sse2neon](https://github.com/DLTcollab/sse2neon)
* [Intel Intrinsics Guide](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html)
* [Microsoft: x86 intrinsics list](https://learn.microsoft.com/en-us/cpp/intrinsics/x86-intrinsics-list)
* [Arm Neon Intrinsics Reference](https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics)
* [Neon Programmer's Guide for Armv8-A](https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/neon-programmers-guide-for-armv8-a)
* [NEON Programmer's Guide](https://static.docs.arm.com/den0018/a/DEN0018A_neon_programmers_guide_en.pdf)
* [qemu/target/i386/ops_sse.h](https://github.com/qemu/qemu/blob/master/target/i386/ops_sse.h): Comprehensive SSE instruction emulation in C. Ideal for semantic checks.
* [Porting Takua Renderer to 64-bit ARM- Part 1](https://blog.yiningkarlli.com/2021/05/porting-takua-to-arm-pt1.html)
* [Porting Takua Renderer to 64-bit ARM- Part 2](https://blog.yiningkarlli.com/2021/07/porting-takua-to-arm-pt2.html)
* [Comparing SIMD on x86-64 and arm64](https://blog.yiningkarlli.com/2021/09/neon-vs-sse.html)
* [Port with SSE2Neon and SIMDe](https://developer.arm.com/documentation/102581/0200/Port-with-SSE2Neon-and-SIMDe)
* [Genomics: Optimizing the BWA aligner for Arm Servers](https://community.arm.com/arm-community-blogs/b/high-performance-computing-blog/posts/optimizing-genomics-and-the-bwa-aligner-for-arm-servers)
* [Bit twiddling with Arm Neon: beating SSE movemasks, counting bits and more](https://community.arm.com/arm-community-blogs/b/infrastructure-solutions-blog/posts/porting-x86-vector-bitmask-optimizations-to-arm-neon)
* [C/C++ on Graviton](https://github.com/aws/aws-graviton-getting-started/blob/main/c-c%2B%2B.md)

## Licensing

`sse2rvv` is freely redistributable under the MIT License.
10 changes: 10 additions & 0 deletions scripts/check-format.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#!/usr/bin/env bash

set -x

for file in ${SOURCES};
do
clang-format ${file} > expected-format
diff -u -p --label="${file}" --label="expected coding style" ${file} expected-format
done
exit $(clang-format --output-replacements-xml ${SOURCES} | egrep -c "</replacement>")
13 changes: 13 additions & 0 deletions scripts/cross-test.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
#!/usr/bin/env bash

# Clang/LLVM is natively a cross-compiler.
# TODO: Do cross-compilation using Clang
# https://clang.llvm.org/docs/CrossCompilation.html
if [ $(printenv CXX | grep clang) ]; then
exit
fi

set -x

make clean
make CROSS_COMPILE=riscv64-unknown-elf- SIMULATOR_TYPE=$1 test || exit 1 # riscv64
Loading