654 Commits
0.1.0 ... 0.2.0

Author SHA1 Message Date
Andrew Kelley
f073923ea0 Release 0.2.0 2018-03-15 09:15:05 -04:00
Andrew Kelley
50e25f6cec add missing docs for setAlignStack builtin 2018-03-14 21:51:06 -04:00
Andrew Kelley
efebb6d341 fix tests broken by previous commit 2018-03-14 03:37:54 -04:00
Andrea Orru
c828c23f71 Tests for zero-bit field compiler error 2018-03-13 22:07:40 -07:00
Andrea Orru
7ac44037db Compiler error when taking @offsetOf of void struct member
closes #739
2018-03-13 21:20:06 -07:00
Andrea Orru
2a6ad23b52 Merge branch 'master' of https://github.com/zig-lang/zig 2018-03-13 16:16:22 -07:00
Andrew Kelley
7f7823e23c fix casting a function to a pointer causing compiler crash
closes #777
2018-03-13 19:15:20 -04:00
Andrea Orru
2cdd50c9b2 Panic instead of segfault when returning generic type from functions
closes #829
2018-03-13 16:14:21 -07:00
Marc Tiehuis
d6e84e325b Add WebAssembly output workaround for LLVM 6 2018-03-13 21:53:42 +13:00
Andrew Kelley
bcce77700f some return types disqualify comptime fn call caching
closes #828
2018-03-12 12:56:25 -04:00
Andrew Kelley
5834ff0cc5 don't memoize comptime fn calls that access comptime mutable state
closes #827
2018-03-12 08:35:41 -04:00
Andrew Kelley
1bf2810f33 fix comptime slicing not preserving comptime mutability
* fix comptime slice of slice not preserving mutatibility
   of the comptime data
 * fix comptime slice of pointer not preserving mutability
   of the comptime data

closes #826
2018-03-12 01:21:10 -04:00
Andrew Kelley
49c3922037 fix incorrect setEvalBranchQuota compile error
closes #688
2018-03-12 00:08:52 -04:00
Andrea Orru
c18059a3dd Merge branch 'master' of https://github.com/zig-lang/zig 2018-03-10 16:59:53 -08:00
Andrea Orru
d0621391bc zen-specific: main -> _start 2018-03-10 16:59:28 -08:00
Andrew Kelley
5bc4f1e3f1 xml2 workaround is relevant for linux too 2018-03-10 18:23:08 -05:00
Andrea Orru
10fb1f2730 Merge branch 'test-ci' 2018-03-10 13:13:48 -08:00
Andrea Orru
152b408934 Simplify intrusive linked list test 2018-03-10 12:20:29 -08:00
Andrew Kelley
e4fd3fd52b workaround for llvm-config missing xml2 2018-03-10 14:48:41 -05:00
Andrew Kelley
6288ad865c change 5 to 6 in travis osx scripts 2018-03-10 14:36:59 -05:00
Andrew Kelley
84e952c230 fix await multithreaded data race
coro return was reading from a value that coro await was
writing to. that wasn't how it was designed to work, it
was an implementation mistake.

this commit also has some work-in-progress code for fixing
error return traces across suspend points.
2018-03-10 01:38:40 -05:00
Andrew Kelley
3b3649b86f refactor stack trace code to remove global state 2018-03-10 01:38:40 -05:00
Andrew Kelley
60b2031831 improvements to stack traces
* @panic generates an error return trace
 * printing an error return trace no longer interferes with
   normal stack traces.
 * instead of ignore_frame_count, we look at the return address
   when you call panic, and that's the first stack trace function
   makes stack traces much cleaner - the error return trace
   flows gracefully into the stack trace
2018-03-10 01:38:40 -05:00
Andrew Kelley
20011a7a1c add behavior test for coroutine frame allocation failure 2018-03-10 01:38:40 -05:00
Andrew Kelley
61a02d9d1e omit pad zeroes in debug stack traces 2018-03-10 01:38:40 -05:00
Andrea Orru
f25c1c6858 Fixed syntax errors in linux-i386 syscalls 2018-03-09 22:25:21 -08:00
Andrea Orru
70c3008a00 Added 6 parameters syscalls for zen 2018-03-09 22:24:52 -08:00
Marc Tiehuis
7a893691c0 Unroll Sha3 inner loop
Issue #699 since fixed. Nearly a x3 perf improvement.

Using --release-fast.

Sha3_256 (before): 96 Mb/s
Sha3_256  (after): 267 Mb/s

Sha3_512 (before): 53 Mb/s
Sha3_512  (after): 142 Mb/s

No real gains from unrolling other initialization loops in crypto
functions so have been left as is.
2018-03-10 10:00:07 +13:00
Andrew Kelley
5a7a0e8518 update to SoftFloat-3e
closes #823
2018-03-09 15:06:06 -05:00
Andrew Kelley
6db9be8900 don't memoize comptime functions if they can mutate state via parameters
closes #639
2018-03-09 14:20:44 -05:00
Andrew Kelley
aaf2230ae8 fix partial inlining of binary math operator using old value
the code was abusing the internal IR API. fixed now.

closes #699
2018-03-08 17:15:55 -05:00
Andrew Kelley
028ec0f2c3 enums with 1 field and explicit tag type still get the tag type
closes #820
2018-03-08 15:22:42 -05:00
Andrew Kelley
aa9902b586 translate-c: add missing case labels 2018-03-08 11:47:07 -05:00
Andrew Kelley
2db28ea849 travis ci: update ubuntu llvm repo and CC,CXX env vars to 6 2018-03-08 11:46:47 -05:00
Andrew Kelley
3200ebc2ea Merge branch 'llvm6'
Zig now depends on LLVM 6.0.0.

The latest commit that depends on LLVM 5.0.1 is
2e010c60ae.
2018-03-08 10:59:54 -05:00
Andrew Kelley
b57cb04afc Merge remote-tracking branch 'origin/master' into llvm6 2018-03-08 10:59:24 -05:00
Jimmi Holst Christensen
2e010c60ae Translate C now correctly converts ints, floats, ptrs and enums to bools
* Boolean "and" and "or" should also work with these types.
* This new method also simplifies to output code.
2018-03-08 15:34:00 +01:00
Jimmi Holst Christensen
b2887620f3 Translate C will now handle ignored return values 2018-03-08 13:15:30 +01:00
Jimmi Holst Christensen
689e241ff8 Merge branch 'master' of github.com:zig-lang/zig 2018-03-08 10:29:43 +01:00
Jimmi Holst Christensen
51b2f1b80b Translate C can now translate switch statements again 2018-03-08 10:29:29 +01:00
Andrew Kelley
790aaeacae add compile error for using @tagName on extern union
closes #742
2018-03-07 14:35:48 -05:00
Jimmi Holst Christensen
bb80daf509 Ast Render no longer outputs erroneous semicolon
closes #813
2018-03-07 10:39:32 +01:00
Andrew Kelley
d96dd5bc32 fix missing compile error for returning error from void async function
closes #799
2018-03-06 21:44:27 -05:00
Andrew Kelley
6b5cfd9d99 turn assertion into compile error for using var as return type
closes #758
2018-03-06 20:41:49 -05:00
Andrew Kelley
eff3530dfa var is no longer a pseudo-type, it is syntax
closes #779
2018-03-06 18:31:31 -05:00
Andrew Kelley
44ae891bd7 fix assertion when taking slice of zero-length array
closes #788
2018-03-06 17:19:45 -05:00
Andrew Kelley
cc0f660ad2 unless hf is specified in target environ, assume soft floating point
closes #804
2018-03-06 16:57:41 -05:00
Andrew Kelley
5d5820029d fix broken tests from previous commit 2018-03-06 16:46:45 -05:00
Andrew Kelley
07e47c058c ptrCast builtin now gives an error for removing const qualifier
closes #384
2018-03-06 16:37:03 -05:00
Andrew Kelley
46e258c9f7 Merge pull request #815 from Hejsil/more-translate-c
Translate C now handles bools better
2018-03-06 10:43:52 -05:00
Andrew Kelley
c3807dfb34 remove value judgement from std lib API docs
documentation should be purely technical, and not contain opinions about
how easy or hard something is.
2018-03-06 10:41:07 -05:00
Jimmi Holst Christensen
1d378d8f26 Removed fixed todo 2018-03-06 12:33:09 +01:00
Jimmi Holst Christensen
5ab25798e3 We now also use trans_to_bool_expr on bool not 2018-03-06 12:04:14 +01:00
Jimmi Holst Christensen
bf47cf418a expr to bool is now it's own function.
* Now while and for loops work on ints and floats, like if statements
* This fixes the loop problem in #813
2018-03-06 11:57:51 +01:00
Jimmi Holst Christensen
61ecc48671 Added appropriate TODO comment to UO_LNot 2018-03-06 11:15:13 +01:00
Jimmi Holst Christensen
ed1386eeff Simple translation of UO_LNot 2018-03-06 11:13:10 +01:00
Andrew Kelley
d34d36619e Merge pull request #814 from jacobdufault/utf8-view
Make Utf8View public, add comments, and make iterator lowercase.
2018-03-06 01:42:04 -05:00
Jacob Dufault
8fd7e9115c Make Utf8View public, add comments, and make iterator lowercase. 2018-03-05 21:42:01 -08:00
Joshua Olson
c787837ce5 Clarify what is meant by 'libraries' (#808) 2018-03-04 19:26:16 -05:00
Joshua Olson
db18d38a43 Fix Linux gcc requirement (#807)
g++ may be a separate package. I had this problem on Fedora.
2018-03-04 17:46:17 -05:00
Andrew Kelley
73a306e2fa fix conflict artifact accidentally in appveyor script 2018-03-03 17:44:41 -05:00
Andrew Kelley
7ee1b88042 add llvm 6.0.0 binaries to appveyor cache 2018-03-03 16:43:57 -05:00
Andrew Kelley
1c244d34b3 Merge branch 'master' into llvm6 2018-03-03 16:30:59 -05:00
Andrew Kelley
56645c1701 std.debug.dwarf supports line number version 4
fixes stack traces for llvm6 generated zig programs
2018-03-02 16:26:22 -05:00
Andrew Kelley
101b7745c4 add optnone noinline to async functions
this works around LLVM optimization assertion failures.
https://bugs.llvm.org/show_bug.cgi?id=36578

closes #800
2018-03-02 13:40:03 -05:00
Andrew Kelley
a217c764db Merge remote-tracking branch 'origin/master' into llvm6 2018-03-01 22:25:15 -05:00
Andrew Kelley
7d494b3e7b Merge branch 'async'
closes #727
2018-03-01 21:55:15 -05:00
Andrew Kelley
de5c0c9f40 Merge remote-tracking branch 'origin/master' into async 2018-03-01 20:47:35 -05:00
Andrew Kelley
6bade0b825 coroutines: add await early test case 2018-03-01 16:17:38 -05:00
Andrew Kelley
8a0e1d4c02 await keyword works 2018-03-01 15:46:35 -05:00
Andrew Kelley
a7c87ae1e4 fix not casting result of llvm.coro.promise 2018-03-01 10:23:47 -05:00
Andrew Kelley
253d988e7c implementation of await
but it has bugs
2018-03-01 03:28:13 -05:00
Andrew Kelley
834e992a7c add test for coroutine suspend with block 2018-02-28 22:26:26 -05:00
Andrew Kelley
8429d4ceac implement coroutine resume 2018-02-28 22:18:48 -05:00
Andrew Kelley
c622766156 async function fulfills promise atomically 2018-02-28 21:48:20 -05:00
Andrew Kelley
807a5e94e9 add atomicrmw builtin function 2018-02-28 21:19:51 -05:00
Andrew Kelley
36eadb569a run coroutine tests only in Debug mode
LLVM 5.0.1, 6.0.0, and trunk crash when attempting to optimize coroutine code.
So, Zig does not support ReleaseFast or ReleaseSafe for coroutines yet.
Luckily, Clang users are running into the same crashes, so folks from the LLVM
community are working on fixes. If we're really lucky they'll be fixed in 6.0.1.
Otherwise we can hope for 7.0.0.
2018-02-28 18:56:26 -05:00
Andrew Kelley
58dc2b719c better coroutine codegen, now passing first coro test
we have to use the Suspend block with llvm.coro.end to
return from the coro
2018-02-28 18:22:43 -05:00
Andrew Kelley
ad2a29ccf2 break the data dependencies that llvm coro transforms cant handle
my simple coro test program builds now

see #727
2018-02-28 16:47:13 -05:00
Andrew Kelley
026aebf2ea another workaround for llvm coroutines
this one doesn't work either
2018-02-28 04:01:22 -05:00
Andrew Kelley
6568be575c Merge branch 'bnoordhuis-fix795' 2018-02-28 00:29:20 -05:00
Andrew Kelley
556f22a751 different way of fixing previous commit
get_fn_type doesn't need the complete parameter type, it
can just ensure zero bits known.
2018-02-28 00:28:26 -05:00
Andrew Kelley
1b8a241f6f Merge branch 'fix795' of https://github.com/bnoordhuis/zig into bnoordhuis-fix795 2018-02-28 00:22:53 -05:00
Andrew Kelley
0f449a3ec1 Merge pull request #796 from bnoordhuis/fix731-more
allow implicit cast from &const to ?&const &const
2018-02-27 23:55:03 -05:00
Ben Noordhuis
90598b4631 fix assert on self-referencing function ptr field
The construct `struct S { f: fn(S) void }` is not legal because structs
are not copyable but it should not result in an ICE.

Fixes #795.
2018-02-28 00:56:00 +01:00
Andrew Kelley
d243453862 Revert "llvm coroutine workaround: sret functions return sret pointer"
This reverts commit 132e604aa3.

this workaround didn't work either
2018-02-27 17:47:18 -05:00
Andrew Kelley
138d6f9093 revert workaround for alloc and free as coro params
reverts 4ac6c4d6bf

the workaround didn't work
2018-02-27 17:46:13 -05:00
Andrew Kelley
132e604aa3 llvm coroutine workaround: sret functions return sret pointer 2018-02-27 17:12:53 -05:00
Andrew Kelley
6e2a67724c Revert "another llvm workaround for getelementptr"
This reverts commit c2f5634fb3.

It doesn't work. With this, LLVM moves the allocate fn call
to after llvm.coro.begin
2018-02-27 14:58:02 -05:00
Andrew Kelley
c2f5634fb3 another llvm workaround for getelementptr 2018-02-27 14:57:49 -05:00
Andrew Kelley
439621e44a remove signal hanlding stuff from std.os.ChildProcess 2018-02-27 11:14:14 -05:00
Andrew Kelley
4e43bde924 workaround for llvm: delete coroutine allocation elision
maybe this can be reverted, but it seems to be related
to llvm's coro transformations crashing.

See #727
2018-02-26 21:31:00 -05:00
Andrew Kelley
4ac6c4d6bf workaround llvm coro transformations
by making alloc and free functions be parameters to async
functions instead of using getelementptr in the DynAlloc block

See #727
2018-02-26 21:14:15 -05:00
Ben Noordhuis
9aa65c0e8e allow implicit cast from &const to ?&const &const
Allow implicit casts from n-th degree const pointers to nullable const
pointers of degree n+1.  That is:

    fn f() void {
        const s = S {};
        const p = &s;
        g(p);   // Works.
        g(&p);  // So does this.
    }

    fn g(_: ?&const &const S) void {  // Nullable 2nd degree const ptr.
    }

Fixes #731 some more.
2018-02-26 19:56:26 +01:00
Andrew Kelley
1eecfdaa9b Merge pull request #785 from bnoordhuis/fix731
allow implicit cast from `S` to `?&const S`
2018-02-26 03:20:46 -05:00
Andrew Kelley
3e86fb500d implement coroutine suspend
see #727
2018-02-26 02:46:21 -05:00
Andrew Kelley
c60496a297 parse await and suspend syntax
See #727
2018-02-26 00:04:11 -05:00
Andrew Kelley
6fef7406c8 move coroutine init code to after coro.begin 2018-02-25 20:29:14 -05:00
Andrew Kelley
6b436146a8 fix invalid memory write in coroutines implementation 2018-02-25 20:28:44 -05:00
Andrew Kelley
6cbea99ed6 async functions are allowed to accept zig types 2018-02-25 20:27:53 -05:00
Andrew Kelley
b018c64ca2 add coroutine LLVM passes 2018-02-25 18:09:39 -05:00
Andrew Kelley
fe354ebb5c coroutines: fix llvm error of instruction not dominating uses
See #727
2018-02-25 17:57:05 -05:00
Andrew Kelley
704a8acb59 fix handle_is_ptr for promise type 2018-02-25 17:34:18 -05:00
Andrew Kelley
83f8906449 codegen for coro_resume instruction
See #727
2018-02-25 17:34:05 -05:00
Andrew Kelley
4eac75914b codegen for coro_free instruction
See #727
2018-02-25 16:46:01 -05:00
Andrew Kelley
d2d2ba10e9 codegen for coro_end instruction
See #727
2018-02-25 16:40:00 -05:00
Andrew Kelley
0cf327eb17 codegen for coro_suspend instruction
See #727
2018-02-25 16:29:07 -05:00
Andrew Kelley
d0f2eca106 codegen for coro_begin instruction
See #727
2018-02-25 16:22:19 -05:00
Andrew Kelley
79f1ff574b codegen for coro_alloc_fail instruction
See #727
2018-02-25 16:15:14 -05:00
Andrew Kelley
bced3fb64c codegen for get_implicit_allocator instruction
See #727
2018-02-25 16:05:10 -05:00
Andrew Kelley
93cbd4eeb9 codegen for coro_alloc and coro_size instructions
See #727
2018-02-25 15:20:31 -05:00
Andrew Kelley
9f6c5a20de codegen for coro_id instruction
See #727
2018-02-25 15:10:29 -05:00
Andrew Kelley
7567448b91 codegen for cancel
See #727
2018-02-25 14:47:58 -05:00
Andrew Kelley
05bf666eb6 codegen for calling an async function
See #727
2018-02-25 02:47:31 -05:00
Marc Tiehuis
08d595b472 Add utf8 string view 2018-02-24 11:32:01 -07:00
Andrew Kelley
8db7a1420f update errors section of docs
closes #768
2018-02-23 20:43:47 -05:00
Andrew Kelley
4955c4b8f9 update C headers to clang 6.0.0rc3 2018-02-23 13:15:16 -05:00
Andrew Kelley
1ba6e1641a LLD patch: workaround for buggy MACH-O code
This reapplies 1a1414fc42
to the embedded LLD.
2018-02-23 13:05:17 -05:00
Andrew Kelley
a33b689f2c update embedded LLD to 6.0.0rc3 2018-02-23 13:04:47 -05:00
Andrew Kelley
9cfd7dea19 Merge remote-tracking branch 'origin/master' into llvm6 2018-02-23 12:56:41 -05:00
Andrew Kelley
78bc62fd34 Revert "workaround on windows for llvm6 missing advapi32.lib in llvm-config"
This reverts commit eaac218d59.

This is fixed now in llvm6 rc3
2018-02-23 12:55:58 -05:00
Andrew Kelley
40dbcd09da fix type_is_codegen_pointer being used incorrectly
The names of these functions should probably change, but at least
the semantics are correct now:
 * type_is_codegen_pointer - the type is either a fn, ptr, or promise
 * get_codegen_ptr_type -
   - ?&T and &T returns &T
   - ?promise and promise returns promise
   - ?fn()void and fn()void returns fn()void
   - otherwise returns nullptr
2018-02-23 12:49:21 -05:00
Ben Noordhuis
f11b948019 allow implicit cast from S to ?&const S
Allow implicit casts from container types to nullable const pointers to
said container type.  That is:

    fn f() void {
        const s = S {};
        g(s);   // Works.
        g(&s);  // So does this.
    }

    fn g(_: ?&const S) void {  // Nullable const pointer.
    }

Fixes #731.
2018-02-23 15:55:57 +01:00
Andrew Kelley
99985ad6fc implement Zig IR for async functions
See #727
2018-02-23 03:03:06 -05:00
Andrew Kelley
b66547e98c Merge pull request #783 from bnoordhuis/fix675
name types inside functions after variable
2018-02-22 14:26:45 -05:00
Ben Noordhuis
0845cbe277 name types inside functions after variable
Before this commit:

    fn f() []const u8 {
        const S = struct {};
        return @typeName(S);  // "f()", unexpected.
    }

And now:

    fn f() []const u8 {
        const S = struct {};
        return @typeName(S);  // "S", expected.
    }

Fixes #675.
2018-02-22 19:54:02 +01:00
Andrew Kelley
ca1b77b2d5 IR analysis for coro.begin
See #727
2018-02-22 11:54:27 -05:00
Andrew Kelley
88e7b9bf80 ir analysis for coro_id and coro_alloc
See #727
2018-02-22 09:36:58 -05:00
Andrew Kelley
37c07d4f3f coroutines: analyze get_implicit_allocator instruction
see #727
2018-02-22 09:30:55 -05:00
Andrew Kelley
b261da0672 add coroutine startup IR to async functions
See #727
2018-02-21 23:28:35 -05:00
Andrew Kelley
884b5fb4cf Merge branch 'bnoordhuis-macho' 2018-02-21 02:00:52 -05:00
Andrew Kelley
623466762e clean up mach-o stack trace code 2018-02-21 02:00:33 -05:00
Andrew Kelley
236bbe1183 implement IR analysis for async function calls
See #727
2018-02-21 00:52:20 -05:00
Andrew Kelley
65a51b401c add promise type
See #727
2018-02-20 16:42:14 -05:00
Andrew Kelley
a06f3c74fd parse async fn definitions
See #727
2018-02-20 00:31:52 -05:00
Andrew Kelley
3d58d7232a parse async fn calls and cancel expressions 2018-02-20 00:05:38 -05:00
Andrew Kelley
af10b0fec2 add async, await, suspend, resume, cancel keywords
See #727
2018-02-19 23:19:59 -05:00
Ben Noordhuis
2b35615ffb fix memory leak in std.debug.openSelfDebugInfo() 2018-02-19 23:11:11 +01:00
Ben Noordhuis
ab48934e9c add support for stack traces on macosx
Add basic address->symbol resolution support.  Uses symtab data from the
MachO image, not external dSYM data; that's left as a future exercise.

The net effect is that we can now map addresses to function names but
not much more.  File names and line number data will have to wait until
a future pull request.

Partially fixes #434.
2018-02-19 23:11:11 +01:00
Andrew Kelley
bde15cf080 improve std lib linux epoll API 2018-02-17 17:53:07 -05:00
Andrew Kelley
72ca2b214d ability to slice an undefined pointer at compile time if the len is 0 2018-02-16 15:22:29 -05:00
Andrew Kelley
cbbd6cfa1e add an assert to catch #777
asserting is better than segfaulting
2018-02-15 23:39:35 -05:00
Andrew Kelley
5f5880979e zig fmt supports simple line comments 2018-02-15 12:30:29 -05:00
Andrew Kelley
cc26148ba7 fix compiler crash when struct contains...
ptr to another struct which contains original struct
2018-02-15 12:14:20 -05:00
Andrew Kelley
1c1c0691cc fix crash when doing comptime float rem comptime int
closes #776
2018-02-14 23:12:51 -05:00
Andrew Kelley
ca597e2bfb std.zig.parser understands try. zig fmt respects a double line break. 2018-02-14 23:00:53 -05:00
Andrew Kelley
9fa35adbd4 fix sometimes not type checking function parameters
closes #774

regression introduced in cfb2c67692
2018-02-14 16:24:43 -05:00
Andrew Kelley
629f134d38 std.zig.parser understands inferred return type and error inference 2018-02-14 15:50:40 -05:00
Andrew Kelley
e8d81c5acf fix build broken by previous commit 2018-02-14 13:55:06 -05:00
Andrew Kelley
d790670f4c self hosted parser: support string literals 2018-02-14 13:43:05 -05:00
Andrew Kelley
1a53c648ed self hosted parser supports builtin fn call with no args 2018-02-14 09:45:10 -05:00
Andrew Kelley
e7ab2bc553 Merge remote-tracking branch 'origin/master' into llvm6 2018-02-13 11:53:20 -05:00
Andrew Kelley
c721354b73 correct doc comment in self hosted parser 2018-02-13 11:17:26 -05:00
Andrew Kelley
02f70cda8a zig_llvm.cpp uses new(std::nothrow)
This fixes a mismatched malloc/delete because
we were allocating with malloc and then llvm was
freeing with delete.
2018-02-13 10:54:46 -05:00
Andrew Kelley
2dcff95bd2 self hosted: add tokenizer test fix eof handling 2018-02-13 10:28:55 -05:00
Andrew Kelley
dfbb8254ca fix self hosted tokenizer handling of EOF 2018-02-12 21:26:15 -05:00
Andrew Kelley
7903a758a4 Merge remote-tracking branch 'origin/master' into llvm6 2018-02-12 17:00:02 -05:00
Andrew Kelley
b4e44c4e80 self hosted parser tests every combination of memory allocation failure 2018-02-12 13:31:50 -05:00
Andrew Kelley
eaac218d59 workaround on windows for llvm6 missing advapi32.lib in llvm-config 2018-02-12 11:05:28 -05:00
Andrew Kelley
491d818f17 Merge remote-tracking branch 'origin/master' into llvm6 2018-02-12 10:48:02 -05:00
Andrew Kelley
ec0846a00f std.heap.ArenaAllocator: fix incorrectly activating safety check 2018-02-12 03:21:18 -05:00
Andrew Kelley
227ead54be back to malloc instead of aligned_alloc for c_allocator
it seems that a 7 years old standard is still too new for the
libc variants that are ubiquitous

(tests failing on macos for not providing C11 ABI)
2018-02-12 03:15:12 -05:00
Andrew Kelley
4a4ea92cf3 remove std.heap.IncrementingAllocator
Use std.heap.FixedBufferAllocator combined with
std.heap.DirectAllocator instead.

std.mem.FixedBufferAllocator is moved to std.heap.FixedBufferAllocator
2018-02-12 02:44:31 -05:00
Andrew Kelley
445b03384a introduce std.heap.ArenaAllocator and std.heap.DirectAllocator
* DirectAllocator does the underlying syscall for every allocation.
 * ArenaAllocator takes another allocator as an argument and
   allocates bytes up front, falling back to DirectAllocator with
   increasingly large allocation sizes, to avoid calling it too often.
   Then the entire arena can be freed at once.

The self hosted parser is updated to take advantage of ArenaAllocator
for the AST that it returns. This significantly reduces the complexity
of cleanup code.

docgen and build runner are updated to use the combination of
ArenaAllocator and DirectAllocator instead of IncrementingAllocator,
which is now deprecated in favor of FixedBufferAllocator combined
with DirectAllocator.

The C allocator calls aligned_alloc instead of malloc, in order to
respect the alignment parameter.

Added asserts in Allocator to ensure that implementors of the
interface return slices of the correct size.

Fixed a bug in Allocator when you call realloc to grow the allocation.
2018-02-12 02:14:44 -05:00
Andrew Kelley
ef6260b3a7 Merge remote-tracking branch 'origin/master' into llvm6 2018-02-11 23:49:20 -05:00
Andrew Kelley
f2d601661d fix exported variable not named in the object file
closes #771
2018-02-11 16:46:02 -05:00
Andrew Kelley
e743b30bbf std: refactor posixOpen to be friendlier to error return traces 2018-02-11 05:26:51 -05:00
Andrew Kelley
46aa416c48 std.os and std.io API update
* move std.io.File to std.os.File
 * add `zig fmt` to self hosted compiler
 * introduce std.io.BufferedAtomicFile API
 * introduce std.os.AtomicFile API
 * add `std.os.default_file_mode`
 * change FileMode on posix from being a usize to a u32
 * add std.os.File.mode to return mode of an open file
 * std.os.copyFile copies the mode from the source file instead of
   using the default file mode for the dest file
 * move `std.os.line_sep` to `std.cstr.line_sep`
2018-02-10 21:02:24 -05:00
Andrew Kelley
8c31eaf2a8 std zig tokenizer: don't require 3 newlines at the end of the source 2018-02-10 14:52:39 -05:00
Andrew Kelley
a2bd9f8912 std lib: modify allocator idiom
Before we accepted a nullable allocator for some stuff like
opening files. Now we require an allocator.

Use the mem.FixedBufferAllocator pattern if a bound on the amount
to allocate is known.

This also establishes the pattern that usually an allocator is the
first argument to a function (possibly after "self").

fix docs for std.cstr.addNullByte

self hosted compiler:
 * only build docs when explicitly asked to
 * clean up main
 * stub out zig fmt
2018-02-09 18:27:50 -05:00
Andrew Kelley
e7bf8f3f04 fix compiler crash switching on global error with no else 2018-02-09 13:49:58 -05:00
Andrew Kelley
1fb308ceee self hosted compiler: move tokenization and parsing to std lib 2018-02-09 13:08:02 -05:00
Andrew Kelley
3919afcad2 fix crash with error peer type resolution
closes #765
2018-02-09 11:16:04 -05:00
Andrew Kelley
2c697e50db appveyor: don't try to build for mingw
pacman is giving me:
:: msys2-runtime and catgets are in conflict.
Remove catgets? [y/N] error: unresolvable package conflicts detected
error: failed to prepare transaction (conflicting dependencies)
2018-02-09 01:15:17 -05:00
Andrew Kelley
5911962842 Merge pull request #759 from zig-lang/error-sets
Error Sets
2018-02-09 00:47:57 -05:00
Andrew Kelley
8e554561df appveyor: answer Yes to all pacman questions 2018-02-09 00:47:13 -05:00
Andrew Kelley
32c988a2d7 fix build runner on windows 2018-02-09 00:24:23 -05:00
Andrew Kelley
916d24cd21 add compile error tests for error sets 2018-02-08 23:44:21 -05:00
Andrew Kelley
4b16874f04 add test for comptime err to int with only 1 member of set 2018-02-08 22:44:15 -05:00
Andrew Kelley
ee982ae162 syntax: parse ?error!i32 as ?(error!i32) 2018-02-08 22:30:08 -05:00
Andrew Kelley
0efe441dfd if statements support comptime known test error, runtime payload 2018-02-08 22:18:13 -05:00
Andrew Kelley
54c06bf715 error sets: runtime safety for int-to-err and err set cast 2018-02-08 21:54:44 -05:00
Andrew Kelley
8fc6e31567 std: fix return type of std.c.write 2018-02-08 20:46:12 -05:00
Andrew Kelley
f9be970375 Merge remote-tracking branch 'origin/master' into error-sets 2018-02-08 20:45:26 -05:00
Andrew Kelley
57edd4dcb3 error sets - fix bad value for constant error literal 2018-02-08 18:13:07 -05:00
Marc Tiehuis
1c236b0766 Add ArrayList functions (#755)
at - Get the item at the n-th index.

insert - Insert and item into the middle of the list, resizing and copying
existing elements if needed.

insertSlice - Insert a slice into the middle of the list, resizing and
copying existing elements if needed.
2018-02-08 11:22:31 -05:00
Andrew Kelley
fee875770c error set casting building 2018-02-08 11:09:18 -05:00
Andrew Kelley
76239f2089 error sets - update langref. all tests passing 2018-02-08 03:02:41 -05:00
Andrew Kelley
0d5ff6f462 error sets - most tests passing 2018-02-08 02:08:45 -05:00
Andrew Kelley
68238d5678 fix comptime fn execution not returning error unions properly 2018-02-07 22:33:05 -05:00
Ben Noordhuis
dd20f558f0 implement openSelfExe() on darwin (#753) 2018-02-07 18:14:32 -05:00
Jeff Fowler
c88e6e8aee improve behavior of zig build (#754)
See #748
2018-02-07 17:45:20 -05:00
Andrew Kelley
5d9e3cb77f LLD patch: workaround for buggy MACH-O code
This reapplies 1a1414fc42
to the embedded LLD.
2018-02-07 17:38:33 -05:00
Andrew Kelley
38aed5af8b update embedded LLD to 6.0.0rc2 2018-02-07 17:38:02 -05:00
Andrew Kelley
aa043a6339 Merge remote-tracking branch 'origin/master' into llvm6 2018-02-07 17:27:30 -05:00
Ben Noordhuis
79ad1d9610 format struct pointers as "<typename>@<address>" (#752) 2018-02-07 16:18:48 -05:00
Ben Noordhuis
0090c2d70b DRY 'is slice?' conditionals in parser (#750) 2018-02-07 14:38:49 -05:00
Andrew Kelley
f99b8b006f error sets - fix most std lib compile errors 2018-02-05 18:09:13 -05:00
Andrew Kelley
6940212ecb error sets: fix peer resolution of error unions 2018-02-05 17:42:13 -05:00
Andrew Kelley
917e6fe370 handle linux returning EINVAL for large writes
See #743
2018-02-05 13:21:08 -05:00
Andrew Kelley
40e4e42a66 handle linux returning EINVAL for large reads
see #743
2018-02-05 12:48:29 -05:00
Andrew Kelley
44d8d654a0 fix test failure, organize code, add new compile error 2018-02-05 09:26:39 -05:00
Andrew Kelley
ec59f76526 Merge pull request #743 from bnoordhuis/linux-random
Use /dev/urandom and sysctl(RANDOM_UUID) on Linux.
2018-02-05 08:09:10 -05:00
Andrew Kelley
b7bc259093 make OutStream and InStream take an error set param 2018-02-05 07:38:24 -05:00
Andrew Kelley
893f1088df error sets - peer resolution for error unions 2018-02-05 01:49:14 -05:00
Andrew Kelley
15075d2c3d error sets - compile error for equality with no common errors 2018-02-05 00:05:04 -05:00
Andrew Kelley
31abef172a fix accidentally linking against kernel32 on non windows 2018-02-04 22:13:21 -05:00
Andrew Kelley
21ce559c9c add --forbid-library
to help track down accidentally linking against a library
2018-02-04 22:06:03 -05:00
Ben Noordhuis
73ee434c8c Use /dev/urandom and sysctl(RANDOM_UUID) on Linux.
Add fallback paths for when the getrandom(2) system call is not
available.  Try /dev/urandom first and sysctl(RANDOM_UUID) second.

The sysctl issues a warning in the system logs with some kernels but
that seems like an acceptable tradeoff for the fallback of a fallback.
2018-02-04 18:58:36 +01:00
Andrew Kelley
61718742f7 *WIP* error sets - std lib test compile but try to link against windows 2018-02-03 14:42:20 -05:00
Andrew Kelley
ef5e7bb469 *WIP* error sets - an inferred error set can end up being the global one 2018-02-03 14:06:37 -05:00
Andrew Kelley
abf5ae6897 *WIP* error sets - support fns called at comptime 2018-02-03 11:51:29 -05:00
Andrew Kelley
b8f59e14cd *WIP* error sets - correctly resolve inferred error sets 2018-02-02 18:13:32 -05:00
Andrew Kelley
39d5f44863 *WI* error sets - basic support working 2018-02-02 14:26:14 -05:00
Andrew Kelley
cfb2c67692 *WIP* error sets - rewrite "const cast only" function 2018-02-02 11:50:19 -05:00
Andrew Kelley
15eb28efaf Merge pull request #738 from corngood/cygwin-fixes
make lld include paths private
2018-02-02 10:53:54 -05:00
David McFarland
4ec856b0f0 make lld include paths private
This fixes a build failure on cygwin caused by <string.h> -> <strings.h> taking
the latter from one of the lld paths.
2018-02-02 10:49:31 -04:00
Andrew Kelley
406496ca33 *WIP* error sets - allow peer type resolution to create new error set 2018-02-01 23:32:09 -05:00
Andrew Kelley
13b36d458f *WIP* error sets - fix implicit cast 2018-02-01 10:23:25 -05:00
Andrew Kelley
5f518dbeb9 *WIP* error sets converting std lib 2018-01-31 22:48:40 -05:00
Andrew Kelley
02b61224b2 add docs for memberType, memberCount, memberName 2018-01-31 20:56:53 -05:00
Andrew Kelley
e6d4028a84 docs: move source encoding section 2018-01-31 20:42:27 -05:00
Andrew Kelley
3a11757d57 add docs recommending to only have 1 cImport 2018-01-31 20:18:47 -05:00
Andrew Kelley
a795e4ce32 add some docs for reflection 2018-01-31 11:47:56 -05:00
Andrew Kelley
44f38b04b0 fix assertion fail when using global var number literal
closes #697
2018-01-31 11:13:39 -05:00
Andrew Kelley
5161d70620 *WIP* error sets 2018-01-31 01:51:31 -05:00
Andrew Kelley
40ca39d3d5 fix error message mentioning unreachable instead of noreturn 2018-01-31 01:44:52 -05:00
Andrew Kelley
3ef6a00bb8 add compile error for duplicate struct, enum, union fields
closes #730
2018-01-30 11:52:03 -05:00
Andrew Kelley
0995a81b8b langref: remove page title header 2018-01-30 10:31:01 -05:00
Andrew Kelley
d6b7d9090e Merge pull request #729 from zig-lang/www-changes
Improve documentation styling for mobile devices
2018-01-30 01:06:20 -05:00
Andrea Orru
7eea20bc50 Add IntrusiveLinkedList to index.zig 2018-01-29 21:02:57 -08:00
Marc Tiehuis
5e9f87c3bd Improve documentation styling for mobile devices
- No overscrolling on small screens
 - Font-size is reduced for more content per screen
 - Tables + Code blocks scroll within a block to avoid page-widenening
2018-01-30 17:33:38 +13:00
Andrew Kelley
1c60f31450 add compile error for calling naked function 2018-01-29 14:01:12 -05:00
Andrew Kelley
96c9a9bdb3 Merge remote-tracking branch 'origin/master' into llvm6 2018-01-29 13:26:09 -05:00
Andrew Kelley
2b5e0b66a2 std: fix fn return syntax for zen os 2018-01-29 10:57:27 -05:00
Andrew Kelley
abe6c2d585 allow packed containers in extern functions 2018-01-29 10:57:09 -05:00
Andrew Kelley
f66ac9a5e7 fix crash when align 1 field before self referential...
...align 8 field as slice return type

closes #723
2018-01-27 18:30:36 -05:00
Andrew Kelley
ad3e2a5da0 fix compiler crash on function with invalid return type
closes #722
2018-01-26 10:37:18 -05:00
Andrew Kelley
47be64af5a Merge remote-tracking branch 'origin/master' into llvm6 2018-01-25 11:51:41 -05:00
Andrew Kelley
f7670882af Merge pull request #720 from zig-lang/require-return-type
syntax: functions require return type. remove `->`
2018-01-25 10:03:26 -05:00
Andrew Kelley
3671582c15 syntax: functions require return type. remove ->
The purpose of this is:

 * Only one way to do things
 * Changing a function with void return type to return a possible
   error becomes a 1 character change, subtly encouraging
   people to use errors.

See #632

Here are some imperfect sed commands for performing this update:

remove arrow:

```
sed -i 's/\(\bfn\b.*\)-> /\1/g' $(find . -name "*.zig")
```

add void:

```
sed -i 's/\(\bfn\b.*\))\s*{/\1) void {/g' $(find ../ -name "*.zig")
```

Some cleanup may be necessary, but this should do the bulk of the work.
2018-01-25 04:10:11 -05:00
Andrew Kelley
e5bc5873d7 rename "debug safety" to "runtime safety"
closes #437
2018-01-25 01:46:12 -05:00
Andrew Kelley
b71a56c9df cleanups that I meant to put in the previous commit 2018-01-23 23:12:38 -05:00
Andrew Kelley
b3a6faf13e replace %defer with errdefer
See #632

now we have 1 less sigil
2018-01-23 23:08:09 -05:00
Andrew Kelley
ad2527d47a clean up readme 2018-01-23 22:56:03 -05:00
Andrew Kelley
c2838f2442 fix printf format specifier 2018-01-23 11:40:22 -05:00
Andrew Kelley
b8dcdc75c1 Merge pull request #716 from zig-lang/export-c-additions
Add array type handling for gen_h
2018-01-23 09:20:57 -05:00
Marc Tiehuis
470ec91164 Add array type handling for gen_h 2018-01-23 23:38:20 +13:00
Andrew Kelley
fa7072f3f2 docgen: verify internal links 2018-01-22 23:06:07 -05:00
Andrew Kelley
cf39819478 add new kind of test: generating .h files. and more
* docgen supports obj_err code kind for demonstrating
   errors without explicit test cases
 * add documentation for `extern enum`. See #367
 * remove coldcc keyword and add @setIsCold. See #661
 * add compile errors for non-extern struct, enum, unions
   in function signatures
 * add .h file generation for extern struct, enum, unions
2018-01-22 22:24:07 -05:00
Andrew Kelley
cacba6f435 fix crash on union-enums with only 1 field
closes #713
2018-01-22 17:23:23 -05:00
Andrew Kelley
b52bffcf8d appveyor: add language reference to build artifacts 2018-01-22 16:14:06 -05:00
Andrew Kelley
5b7ae86af4 fix crash when switching on enum with 1 field and no switch prongs
closes #712
2018-01-21 14:44:24 -05:00
Andrew Kelley
517e8ea426 remove unused function, fixes mingw build 2018-01-20 02:49:53 -05:00
Andrew Kelley
ddd04a7b46 fix docgen on windows 2018-01-19 22:17:31 -05:00
Andrew Kelley
ec27d3b4ba Merge pull request #711 from zig-lang/fix-build-template
Fix build template to match build runner changes
2018-01-19 20:47:20 -05:00
Marc Tiehuis
a7e10565fc Fix build template to match build runner changes
Api changed in 7b57454cc1.
2018-01-20 13:32:49 +13:00
Andrew Kelley
890bf001db os_rename uses MoveFileEx on windows 2018-01-19 16:53:08 -05:00
Andrew Kelley
9f5c0b6e60 windows-compatible os_rename function
windows libc rename() requires destination file path to not exist
2018-01-19 16:31:21 -05:00
Andrew Kelley
2eede35577 Merge pull request #710 from Hejsil/seekto-getpos-windows
Implemented windows versions of seekTo and getPos
2018-01-19 16:17:04 -05:00
Jimmi Holst Christensen
d8469e3c7c usize might be same size as LARGE_INTEGER. If that's the case, then we don't want to compare pos to @maxValue(usize). 2018-01-19 22:08:44 +01:00
Jimmi Holst Christensen
a1a69f24c8 We now make a more correct conversion from windows LARGE_INTEGER type to usize 2018-01-19 22:05:56 +01:00
Jimmi Holst Christensen
61497893d3 Removed bitcast from usize to isize in seekTo 2018-01-19 21:57:13 +01:00
Andrew Kelley
613c4dbf58 temporary workaround for os.deleteTree not implemented for windows/mac
See #709
2018-01-19 15:51:37 -05:00
Jimmi Holst Christensen
8be606ec80 Now using the right unexpectedError in seekForward 2018-01-19 21:51:10 +01:00
Jimmi Holst Christensen
a76023bcd8 Removed PLARGE_INTEGER 2018-01-19 21:49:16 +01:00
Jimmi Holst Christensen
90714a3831 Implemented windows versions of seekTo and getPos 2018-01-19 21:30:57 +01:00
Andrew Kelley
21e8ecbafa readme: specify that we need exactly llvm 5.0.1
closes #708
2018-01-19 04:01:03 -05:00
Andrew Kelley
2c25c8aeed docs: remove references to %% prefix operator
also cleanup the table of contents
2018-01-19 03:47:27 -05:00
Andrew Kelley
ea623f2d39 all doc code examples are now tested
improve color scheme of docs
make docs depend on no external files
fix broken example code in docs

closes #465
2018-01-19 03:21:47 -05:00
Andrew Kelley
4b64c777ee add compile error for shifting by negative comptime integer
closes #698
2018-01-18 17:47:21 -05:00
Andrew Kelley
0fc645ab70 emit a compile error for @panic called at compile time
closes #706
2018-01-18 17:15:36 -05:00
Andrew Kelley
0b8f19fcba fix null debug info for 0-length array type
closes #702
2018-01-18 15:08:20 -05:00
Andrew Kelley
0aae96b5f0 test: fix brace expansion test not checking invalid inputs 2018-01-18 11:41:20 -05:00
Andrew Kelley
4556f44806 LLD patch: workaround for buggy MACH-O code
This reapplies 1a1414fc42
to the embedded LLD.
2018-01-17 17:30:38 -05:00
Andrew Kelley
4aed7ea6f8 update embedded LLD to 6.0.0rc1 2018-01-17 17:29:21 -05:00
Andrew Kelley
48cd808185 Merge remote-tracking branch 'origin/master' into llvm6 2018-01-17 13:11:21 -05:00
Andrew Kelley
a4e8e55908 Merge pull request #701 from Hejsil/fix-xor-with-zero
Fixed bigint_xor for none negative numbers
2018-01-17 10:24:27 -05:00
Jimmi Holst Christensen
1d6f54cc7d A few more none negative cases, just to be sure we've covered everything 2018-01-17 14:35:13 +01:00
Jimmi Holst Christensen
fa2c3be341 More tests, and fixed none negative bigint xor 2018-01-17 14:31:47 +01:00
Jimmi Holst Christensen
db0fc32ab2 fixed xor with zero 2018-01-17 14:00:27 +01:00
Andrew Kelley
2e6125bc66 ziglang.org home page no longer in this repo
update docs examples which use build-exe to be tested

See #465
2018-01-17 03:24:49 -05:00
Marc Tiehuis
7a3fd89d25 Add Sha3 hashing functions
These are on the slower side and could be improved. No performance optimizations
yet have been done.

```
Cpu: Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz
```

-- Sha3-256

```
Zig --release-fast
    93 Mb/s
Zig --release-safe
    99 Mb/s
Zig
    4 Mb/s
```

-- Sha3-512

```
Zig --release-fast
    49 Mb/s
Zig --release-safe
    54 Mb/s
Zig
    2 Mb/s
```

Interestingly, release-safe is producing slightly better code than
release-fast.
2018-01-17 21:19:45 +13:00
Marc Tiehuis
dfd5363494 Add throughput test program
Blake performance numbers for reference:

```
Cpu: Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz
```

-- Blake2s

```
Zig --release-fast
    485 Mb/s
Zig --release-safe
    377 Mb/s
Zig
    11 Mb/s
```

-- Blake2b
```
Zig --release-fast
    616 Mb/s
Zig --release-safe
    573 Mb/s
Zig
    18 Mb/s
```
2018-01-17 21:19:45 +13:00
Marc Tiehuis
7af53d0826 Fix crypto exports 2018-01-17 21:19:45 +13:00
Andrew Kelley
1eda7e0fde docgen: support executing exe code examples
See #465
2018-01-17 01:50:35 -05:00
Andrew Kelley
5aefabe045 docgen: validate See Also sections
See #465
2018-01-17 00:22:53 -05:00
Andrew Kelley
2774fe8a1b docgen auto generates table of contents
See #465
2018-01-17 00:22:53 -05:00
Andrew Kelley
4bdfc8a10a fix error return traces pointing to off-by-one source line
See #651
2018-01-17 00:22:53 -05:00
Josh Wolfe
24c2ff5cae Revert "Buffer.toSliceCopy"
This reverts commit c58f5a4742.
2018-01-16 13:45:34 -07:00
Josh Wolfe
c58f5a4742 Buffer.toSliceCopy 2018-01-16 13:28:53 -07:00
Andrew Kelley
b897e98d30 Merge remote-tracking branch 'origin/master' into llvm6 2018-01-16 12:26:04 -05:00
Andrew Kelley
ee9ab15679 Merge pull request #695 from Hejsil/tranlate-c-fixes
Tranlate c fixes - undefined variable initialization and non-bool if statements
2018-01-16 10:32:37 -05:00
Jimmi Holst Christensen
3974b7d31d translate_c can now translate if statements on integers and floats 2018-01-16 15:48:28 +01:00
Jimmi Holst Christensen
f59dcc5546 Fixed tests for undefined variables 2018-01-16 15:21:48 +01:00
Andrew Kelley
8b280d5b31 Merge pull request #689 from zig-lang/blake2
Add Blake2X hash functions
2018-01-16 09:13:09 -05:00
Jimmi Holst Christensen
821cbd7a1b Output "undefined" on uninitialized variables 2018-01-16 15:01:02 +01:00
Marc Tiehuis
73b4f09845 Add crypto internal test functions 2018-01-17 00:20:20 +13:00
Marc Tiehuis
66a24c9c00 Merge branch 'master' into blake2 2018-01-17 00:20:06 +13:00
Marc Tiehuis
fa7b33549e Change crypto functions to fill a buffer
- Rename blake2x -> blake2
 - Fix blake2s truncated tests
2018-01-17 00:17:48 +13:00
Andrew Kelley
6a95b88d1b fix bigint remainder division
See #405
2018-01-16 03:09:44 -05:00
Andrew Kelley
84d8584c5b implement bigint div and rem
See #405
2018-01-16 02:22:19 -05:00
Andrew Kelley
92fc5947fc fix compiler crash related to @alignOf 2018-01-15 20:44:21 -05:00
Andrew Kelley
5a4968484b Merge branch 'wip-err-ret-trace' 2018-01-15 16:28:30 -05:00
Andrew Kelley
6ec9933fd8 fix getting debug info twice in default panic handler 2018-01-15 16:26:13 -05:00
Marc Tiehuis
4cf86b4a94 Add Blake2X hash functions
The truncated output variants currently are dependent on a more complete
bigint implementation in the compiler.
2018-01-15 23:14:13 +13:00
Andrew Kelley
c9ac607bd3 add builtin.have_error_return_tracing 2018-01-15 00:14:14 -05:00
Andrew Kelley
7b57454cc1 clean up error return tracing
* error return tracing is disabled in release-fast mode
 * add @errorReturnTrace
 * zig build API changes build return type from `void` to `%void`
 * allow `void`, `noreturn`, and `u8` from main. closes #535
2018-01-15 00:01:02 -05:00
Andrew Kelley
d973b40884 stack traces are a variable number of frames 2018-01-14 19:40:02 -05:00
Andrew Kelley
f0df2cdde9 error return traces use a zig-provided function to save binary size 2018-01-14 16:26:06 -05:00
Andrew Kelley
793f031c4c remove 32-bit windows from supported targets list
we still want to support it, but there are too many bugs
to claim that we support it right now.

See #537
2018-01-14 15:17:07 -05:00
Andrew Kelley
fa024f8092 error return trace pointer prefixes other params
instead of being last. This increases the chances that it can
remain in the same register between calls.
2018-01-14 14:35:43 -05:00
Andrew Kelley
971a6fc531 fix duplicate stack trace code 2018-01-14 10:19:21 -05:00
Andrew Kelley
e7e7625633 Merge pull request #687 from zig-lang/sha2
Add Sha2 functions
2018-01-13 21:38:29 -05:00
Marc Tiehuis
9be9f1ad20 Disable win32 tests for Sha2 + correct lengths 2018-01-14 09:58:30 +13:00
Marc Tiehuis
1f3ed5cf27 Change indexing variable types for crypto functions 2018-01-13 22:44:58 +13:00
Marc Tiehuis
2659ac01be Add Sha2 functions
We take the fastest time measurement taken across multiple runs. Tested
across multiple compiler flags and the best chosen.

```
Cpu: Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz
Gcc: 7.2.1 20171224
Clang: 5.0.1
Zig: 0.1.1.304f6f1d
```

See https://www.nayuki.io/page/fast-sha2-hashes-in-x86-assembly.

```
Gcc -O2
    219 Mb/s
Clang -O2
    213 Mb/s
Zig --release-fast
    284 Mb/s
Zig --release-safe
    211 Mb/s
Zig
    6 Mb/s
```

```
Gcc -O2
    350 Mb/s
Clang -O2
    354 Mb/s
Zig --release-fast
    426 Mb/s
Zig --release-safe
    300 Mb/s
Zig
    11 Mb/s
```
2018-01-13 22:37:47 +13:00
Andrew Kelley
4551489b92 typecheck the panic function 2018-01-13 01:00:50 -05:00
Andrew Kelley
a2315cfbfc Merge pull request #686 from zig-lang/md5-sha1
Add Md5 and Sha1 functions
2018-01-13 00:00:33 -05:00
Marc Tiehuis
51fdbf7f8c Add Md5 and Sha1 hash functions
Some performance comparisons to C.

We take the fastest time measurement taken across multiple runs.

The block hashing functions use the same md5/sha1 methods.

```
Cpu: Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz
Gcc: 7.2.1 20171224
Clang: 5.0.1
Zig: 0.1.1.304f6f1d
```

See https://www.nayuki.io/page/fast-md5-hash-implementation-in-x86-assembly:

```
gcc -O2
    661 Mb/s
clang -O2
    490 Mb/s
zig --release-fast and zig --release-safe
    570 Mb/s
zig
    50 Mb/s
```

See https://www.nayuki.io/page/fast-sha1-hash-implementation-in-x86-assembly:

```
gcc -O2
    588 Mb/s
clang -O2
    563 Mb/s
zig --release-fast and zig --release-safe
    610 Mb/s
zig
    21 Mb/s
```

In short, zig provides pretty useful tools for writing this sort of
code. We are in the lead against clang (which uses the same LLVM
backend) with us being slower only against md5 with GCC.
2018-01-13 14:40:21 +13:00
Marc Tiehuis
304f6f1d01 Add integer rotation functions 2018-01-13 13:23:12 +13:00
Andrew Kelley
32ea6f54e5 *WIP* proof of concept error return traces 2018-01-12 02:12:11 -05:00
Andrew Kelley
7ec783876a functions which can return errors have secret stack trace param
See #651
2018-01-11 23:04:08 -05:00
Andrew Kelley
eb3726c502 Merge branch 'master' into llvm6 2018-01-11 22:26:55 -05:00
Andrew Kelley
3268276b58 the same string literal codegens to the same constant
this makes it so that you can send the same string literal
as a comptime slice and get the same type
2018-01-11 21:02:30 -05:00
Andrew Kelley
465e75bc5a Merge pull request #682 from zig-lang/fix-endian
Fix endian swap parameters
2018-01-11 02:51:17 -05:00
Marc Tiehuis
899e36489d Fix endian swap parameters 2018-01-11 19:50:08 +13:00
Andrew Kelley
891c93c118 Merge pull request #681 from zig-lang/hw-math
Add hw sqrt for x86_64
2018-01-10 10:22:40 -05:00
Andrew Kelley
d4f791cf6c Merge pull request #680 from zig-lang/intrusiveLinkedList
Intrusive linked lists
2018-01-10 10:13:15 -05:00
Marc Tiehuis
24cd99160c Add hw sqrt for x86_64 2018-01-10 19:53:36 +13:00
Andrea Orru
19343db593 Intrusive linked lists 2018-01-10 00:33:07 -05:00
Andrew Kelley
d1d3dbc7b5 Merge branch 'master' into llvm6 2018-01-09 09:56:24 -05:00
Andrew Kelley
3c094116aa remove %% prefix operator
See #632
closes #545
closes #510

this makes #651 higher priority
2018-01-09 00:51:51 -05:00
Andrea Orru
98a95cc698 exit, createThread for zen 2018-01-08 12:16:23 -05:00
Andrew Kelley
5a8d87f504 Merge branch 'master' into llvm6 2018-01-08 10:34:45 -05:00
Andrew Kelley
598170756c a catch unreachable generates unwrap-error code
See #545
See #510
See #632
2018-01-07 18:13:54 -05:00
Andrew Kelley
632d143bff replace a %% b with a catch b
See #632

better fits the convention of using keywords for control flow
2018-01-07 17:28:20 -05:00
Andrew Kelley
66717db735 replace %return with try
See #632

better fits the convention of using keywords for control flow
2018-01-07 16:53:13 -05:00
Andrea Orru
de1f57926f Merge branch 'master' of github.com:zig-lang/zig 2018-01-07 04:43:15 -05:00
Andrea Orru
3182857224 Adding zen support 2018-01-07 04:43:08 -05:00
Andrew Kelley
32ba0dcea9 update hello world docs 2018-01-07 01:59:23 -05:00
Andrew Kelley
e7c04b6df2 add a test for returning a type that closes over a local const
closes #552
2018-01-07 00:50:43 -05:00
Andrew Kelley
bb39e503c0 fix struct inside function referencing local const
closes #672

the crash and compile errors are fixed but structs
inside functions still get named after the functions
they're in. this will be fixed later.
2018-01-07 00:28:37 -05:00
Andrea Orru
ad438cfd40 Merge branch 'master' of github.com:zig-lang/zig 2018-01-06 23:13:51 -05:00
Andrea Orru
e932919e68 Darwin -> MacOSX, added Zen. See #438 2018-01-06 23:10:53 -05:00
Andrew Kelley
a9d2a7f002 Merge pull request #674 from Hejsil/readInt-calling-fix
Fixed calls to mem.readInt
2018-01-06 19:45:08 -05:00
Jimmi Holst Christensen
e91136d61f Fixed the call to mem.readInt in endian.swap 2018-01-07 00:24:35 +01:00
Jimmi Holst Christensen
6f85c860c6 Fixed the call to mem.readInt in Rand.scalar 2018-01-07 00:24:17 +01:00
Andrew Kelley
38658a597b Merge branch 'master' into llvm6 2018-01-06 02:59:17 -05:00
Andrew Kelley
dde7cc52d2 fix exp1m implementation
in the llvm6 branch with assertions on, it failed the test
this fixes it
2018-01-06 02:58:45 -05:00
Andrew Kelley
17e68c4a11 disable NewGVN
closes #673
2018-01-06 00:15:37 -05:00
Andrew Kelley
2200c2de6f translate-c: update to clang 6.0.0 which has more binary operators 2018-01-05 13:53:04 -05:00
Andrew Kelley
5d9a8cbe1a Merge remote-tracking branch 'origin/master' into llvm6 2018-01-05 13:46:21 -05:00
Andrew Kelley
e08a4ea62d Merge branch 'appveyor' 2018-01-05 12:16:16 -05:00
Andrew Kelley
2c35e24bd9 workaround for microsoft releasing windows SDK with wrong version 2018-01-05 11:35:46 -05:00
Andrew Kelley
79d50d9933 appveyor: enable verbose link for self hosted compiler 2018-01-04 23:43:46 -05:00
Andrew Kelley
f377b1e886 Revert "appveyor ci: look for newer windows sdk version"
This reverts commit 31d632b72e.

according to
https://developer.microsoft.com/en-us/windows/downloads/sdk-archive
10240 is actually 26624 and there was some kind of versioning issue.
2018-01-04 23:37:21 -05:00
Andrew Kelley
7f0b12a481 appveyor: skip building self hosted compiler for now 2018-01-04 23:30:03 -05:00
Andrew Kelley
25ad0b47e2 appveyor: try using vcvarsall 2018-01-04 23:11:27 -05:00
Andrew Kelley
d1ef17e3cd appveyor: set VCINSTALLDIR 2018-01-04 22:59:39 -05:00
Andrew Kelley
1b120d1e49 update windows build to llvm 5.0.1
llvm-config.exe does not handle diaguids.lib for us so we have to
duplicate the work.
2018-01-04 22:46:26 -05:00
Andrew Kelley
21a552682e Revert "try using appveyor's llvm copy"
This reverts commit 35dc987dc8.
2018-01-04 19:06:48 -05:00
Andrew Kelley
35dc987dc8 try using appveyor's llvm copy 2018-01-04 18:54:46 -05:00
Andrew Kelley
31d632b72e appveyor ci: look for newer windows sdk version 2018-01-04 18:34:42 -05:00
Andrew Kelley
7e65fe7ac3 fix test regressions on windows from previous commit 2018-01-04 16:36:59 -05:00
Andrew Kelley
d008e209e7 self-hosted compiler works on macos 2018-01-04 15:30:22 -05:00
Andrew Kelley
e1c03d9e8e self-hosted compiler works on windows
* better error message for realpath failing
 * fix bug in std.io.readFileAllocExtra incorrectly returning
   error.EndOfStream
 * implement std.os.selfExePath and std.os.selfExeDirPath for windows
2018-01-04 13:48:45 -05:00
Andrew Kelley
0cd63b28f3 fix self-hosted build on windows 2018-01-03 22:38:13 -05:00
Andrew Kelley
477e3f64fc self-hosted build: use llvm-config from stage1 2018-01-03 21:32:50 -05:00
Andrew Kelley
5c8600d790 add december in review to reading material; fix docs 2018-01-03 21:11:58 -05:00
Andrew Kelley
8eae4a0967 Merge branch 'master' into llvm6 2018-01-03 20:53:53 -05:00
Andrew Kelley
5a800db48c build: std files and c header files are only specified once
In the CMakeLists.txt file. And then we communicate the list
to the zig build.
2018-01-03 19:39:04 -05:00
Andrew Kelley
a45db7e853 add building the self hosted compiler to the main test suite 2018-01-03 18:25:17 -05:00
Andrew Kelley
5b156031e9 enum tag values are expressions so no parentheses needed 2018-01-03 16:05:37 -05:00
Andrew Kelley
5c988cc722 readme: update macos installation instructions 2018-01-03 14:16:20 -05:00
Andrew Kelley
36ff26609b fix self hosted compiler on windows 2018-01-03 04:55:49 -05:00
Andrew Kelley
6281a511e1 add noInlineCall to docs 2018-01-03 03:27:48 -05:00
Andrew Kelley
c741d3f4b2 add test for while respecting implicit comptime 2018-01-03 03:15:06 -05:00
Andrew Kelley
d9d61ed563 doc fixes 2018-01-03 02:51:45 -05:00
Andrew Kelley
1d77f8db28 Merge branch 'master' into llvm6 2018-01-03 00:42:00 -05:00
Andrew Kelley
0ea50b3157 ir: new pass iteration strategy
Before:
 * IR basic blocks are in arbitrary order
 * when doing an IR pass, when a block is encountered, code
   must look at all the instructions in the old basic block,
   determine what blocks are referenced, and queue up those
   old basic blocks first.
 * This had a bug (See #667)

Now:
 * IR basic blocks are required to be in an order that guarantees
   they will be referenced by a branch, before any instructions
   within are referenced.
   ir pass1 is updated to meet this constraint.
 * When doing an IR pass, we iterate over old basic blocks
   in the order they appear. Blocks which have not been
   referenced are discarded.
 * After the pass is complete, we must iterate again to look
   for old basic blocks which now point to incomplete new
   basic blocks, due to comptime code generation.
 * This last part can probably be optimized - most of the time
   we don't need to iterate over the basic block again.

closes #667
2018-01-02 21:08:12 -05:00
Andrew Kelley
aafb832288 Merge pull request #668 from sparrisable/master
Added format for floating point numbers. {.x} where x is the number of decimals.
2017-12-30 23:21:02 -05:00
Peter Rönnquist
d15b02a6b6 Added format for floating point numbers. {.x} where x is the number of decimals. 2017-12-31 00:27:58 +01:00
Josh Wolfe
4e3d7fc4bc fix self-hosted parser test 2017-12-26 23:29:15 -07:00
Josh Wolfe
192a039173 move utf8 parsing to std
source files no longer need to end with a newline
2017-12-26 23:17:33 -07:00
Andrew Kelley
6bfaf262d5 Merge branch 'master' into llvm6 2017-12-26 21:44:08 -05:00
Josh Wolfe
08dd1b553b set compile flags for zip_cpp 2017-12-26 18:05:43 -07:00
Andrew Kelley
6fece14cfb self-hosted: build against zig_llvm and embedded LLD
Now the self-hosted compiler re-uses the same C++ code for interfacing
with LLVM as the C++ code.
It also links against the same LLD library files.
2017-12-26 19:44:08 -05:00
Andrew Kelley
2a25398c86 fix segfault when passing union enum with sub byte...
...field to const slice parameter

we use a packed struct internally to represent a const array
of disparate union values, and needed to update the internal
getelementptr instruction to recognize that.

closes #664
2017-12-24 04:11:58 -05:00
Andrew Kelley
86397a532e docs: fix typo 2017-12-24 02:52:30 -05:00
Josh Wolfe
f0a1753607 add source encoding rules to the docs. see #663 2017-12-23 22:23:06 -07:00
Josh Wolfe
d6a74ed463 [self-hosted] source must be valid utf8. see #663 2017-12-23 21:47:13 -07:00
Josh Wolfe
fb96c3e73e debug needs to export FailingAllocator 2017-12-23 21:47:13 -07:00
Andrew Kelley
4183c6f1a5 move std/debug.zig to a subdirectory
self hosted compiler parser tests do some fuzz testing
2017-12-23 22:15:48 -05:00
Andrew Kelley
9dae796fe3 translate-c: set up debug scope for translated functions 2017-12-23 22:14:35 -05:00
Andrew Kelley
79c2ceb2d5 build: findLLVM correctly handles system libraries 2017-12-23 22:14:35 -05:00
Andrew Kelley
e0a1466bd8 build: add --search-prefix option 2017-12-23 22:14:35 -05:00
Andrew Kelley
2031989d98 std.os.path.resolve handles an absolute path that is missing the drive 2017-12-23 22:14:35 -05:00
Andrew Kelley
8b716f941d Merge branch 'master' into llvm6 2017-12-23 21:21:32 -05:00
Andrew Kelley
87ba004d46 translate-c: set up debug scope for translated functions 2017-12-23 21:20:38 -05:00
Andrew Kelley
c8302a5a0e build: findLLVM correctly handles system libraries 2017-12-23 21:19:48 -05:00
Josh Wolfe
0082989f22 [self-hosted] tokenizer error for ascii control codes 2017-12-23 18:35:45 -07:00
Andrew Kelley
3cbc244e98 build: add --search-prefix option 2017-12-23 20:21:57 -05:00
Andrew Kelley
74a12d818d std.os.path.resolve handles an absolute path that is missing the drive 2017-12-23 19:50:01 -05:00
Josh Wolfe
45ab752f9a source files must end with newline 2017-12-23 17:47:48 -07:00
Andrew Kelley
fe66046283 Merge remote-tracking branch 'origin/master' into llvm6 2017-12-23 12:00:25 -05:00
Andrew Kelley
39c7bd24e4 port most of main.cpp to self hosted compiler 2017-12-23 00:57:56 -05:00
Andrew Kelley
760b307e8a fix endianness of sub-byte integer fields in packed structs
closes #307
2017-12-22 18:27:33 -05:00
Andrew Kelley
e44a11341d std.math: remove unnecessary inline calls and
workaround windows 32 bit test failure
See #537
2017-12-22 13:14:07 -05:00
Josh Wolfe
0e7fb69bea bufPrint returns an error 2017-12-22 00:52:01 -07:00
Andrew Kelley
ea805c5fe7 fix darwin and windows from previous commit 2017-12-22 02:33:39 -05:00
Andrew Kelley
d917815d81 explicitly return from blocks
instead of last statement being expression value

closes #629
2017-12-22 00:50:30 -05:00
Andrew Kelley
8bc523219c add labeled loops, labeled break, labeled continue. remove goto
closes #346
closes #630

regression: translate-c can no longer translate switch statements.
after #629 we can ressurect and modify the code to utilize arbitrarily
returning from blocks.
2017-12-20 23:00:19 -05:00
Andrew Kelley
d686113bd2 fix crash when implicitly casting array of len 0 to slice
closes #660
2017-12-19 22:38:02 -05:00
Andrew Kelley
1cc450e6e7 fix assert when wrapping zero bit type in nullable
closes #659
2017-12-19 18:21:42 -05:00
Andrew Kelley
1435604b84 add sort.min and sort.max functions to stdlib 2017-12-19 17:35:38 -05:00
Andrew Kelley
2a8160e80f Merge branch 'export-rewrite'
introduces the `@export` builtin function which can be used
in a comptime block to conditionally export a function.

it also allows creation of aliases.

previous export syntax is still allowed.

closes #462
closes #420
2017-12-19 02:44:14 -05:00
Andrew Kelley
9d9201c3b4 bring back code that uses export and fix tests
partial revert of 1fdebc1dc4
2017-12-19 02:39:43 -05:00
Andrew Kelley
27ba4f0baf export keyword works again 2017-12-19 01:49:42 -05:00
Andrew Kelley
c627f9ea18 wip bring back export keyword 2017-12-19 01:19:49 -05:00
Andrew Kelley
1fdebc1dc4 wip export rewrite 2017-12-18 09:59:57 -05:00
Andrew Kelley
3f65887974 fix std.mem missing error.OutOfMemory decl
this will be fixed in a better way later by #632
2017-12-17 20:52:29 -05:00
Josh Wolfe
ab44939941 roughly parsing infix operators 2017-12-17 11:16:55 -07:00
Andrew Kelley
39e96d933e change mem.cmp to mem.lessThan and add test 2017-12-15 17:26:22 -05:00
Andrew Kelley
68f6332343 fix missing import from previous commit 2017-12-14 21:24:00 -05:00
Andrew Kelley
6bc0561d13 disable sort tests for 32-bit windows because of issue #537 2017-12-14 19:55:34 -05:00
Andrew Kelley
75ecfdf66d replace quicksort with blocksort
closes #657
2017-12-14 19:41:35 -05:00
Andrew Kelley
c9e01412a4 fix compiler crash in a nullable if after an if in...
...a switch prong of a switch with 2 prongs in an else

closes #656
2017-12-14 01:07:23 -05:00
Andrew Kelley
f55fdc00fc fix const and volatile qualifiers being dropped sometimes
in the expression `&const a.b`, the const (and/or volatile)
qualifiers would be incorrectly dropped.

closes #655
2017-12-13 21:53:52 -05:00
Andrew Kelley
84619abe9f add test for allowing slice[slice.len..slice.len] 2017-12-12 21:56:13 -05:00
Josh Wolfe
d295279b16 self-hosted: implement var decl align 2017-12-12 19:50:43 -07:00
Josh Wolfe
0003cc8105 self-hosted: implement addr of align parsing 2017-12-12 19:26:33 -07:00
Andrew Kelley
24c2703dfa self-hosted: look for llvm-config in homebrew 2017-12-12 17:25:57 -05:00
Andrew Kelley
cdaa735b2b self-hosted: build tries to find llvm-config.exe 2017-12-12 16:40:04 -05:00
Andrew Kelley
2b9302107f self-hosted: cleanup build looking for llvm-config 2017-12-12 16:03:20 -05:00
Andrew Kelley
cd5fd653d7 self-hosted: move code to std.os.ChildProcess.exec 2017-12-12 14:35:53 -05:00
Andrew Kelley
caa6433b56 stack traces: support DW_AT_ranges
This makes some cases print stack traces where it previously failed.
2017-12-12 12:05:28 -05:00
Andrew Kelley
23058d8b43 self-hosted: link with LLVM 2017-12-11 23:34:59 -05:00
Andrew Kelley
ed4d94a5d5 self-hosted: test all out of memory conditions 2017-12-11 21:12:47 -05:00
Andrew Kelley
c4e7d05ce3 refactor debug.global_allocator into mem.FixedBufferAllocator 2017-12-11 17:27:31 -05:00
Andrew Kelley
d8d379faf1 self-hosted: refactor into multiple files
add return expression
add number literal
2017-12-11 16:18:06 -05:00
Andrew Kelley
a3a590a32a self-hosted: workaround for issue #537 2017-12-11 14:47:20 -05:00
Andrew Kelley
fd6a36a235 self-hosted: parsing and rendering blocks 2017-12-11 09:21:06 -05:00
Andrew Kelley
9a51091a5c self-hosted: clean up parser 2017-12-10 23:19:01 -05:00
Andrew Kelley
f951bcf01b self-hosted: parse variable declarations with types 2017-12-10 23:02:45 -05:00
Andrew Kelley
53d58684a6 self-hosted: parse var decls 2017-12-10 22:44:04 -05:00
Andrew Kelley
f210f17d30 add self-hosted parsing and rendering to main tests 2017-12-10 21:26:52 -05:00
Andrew Kelley
4b1d120f58 Merge remote-tracking branch 'origin/master' into self-hosted 2017-12-10 19:41:01 -05:00
Andrew Kelley
dc2e3465c7 rendering source code without recursion 2017-12-10 19:40:46 -05:00
Andrew Kelley
22dc713a2f mem.Allocator initializes bytes to undefined 2017-12-10 15:38:05 -05:00
Andrew Kelley
990db3c35a rename @EnumTagType to @TagType in type names 2017-12-10 15:03:57 -05:00
Andrew Kelley
62ead3a2ee parsing an extern fn declaration 2017-12-09 20:50:31 -05:00
Andrew Kelley
e9efa74333 partial parameter decl parsing 2017-12-09 20:01:13 -05:00
Andrew Kelley
f466e539ef tokenizing libc hello world 2017-12-08 23:56:07 -05:00
Andrew Kelley
d431b0fb99 parse a simple variable declaration 2017-12-08 23:15:43 -05:00
Andrew Kelley
5ead3244a2 Merge remote-tracking branch 'origin/master' into self-hosted 2017-12-08 23:15:07 -05:00
Andrew Kelley
756a218e27 add implicit cast from enum tag type of union to const ptr to the union
closes #654
2017-12-08 17:49:14 -05:00
Andrew Kelley
18cf256817 Merge branch 'master' into self-hosted 2017-12-08 16:39:00 -05:00
Andrew Kelley
3577a80bb6 translate-c: more complex logic for translating a C cast in a macro 2017-12-08 12:28:21 -05:00
Andrew Kelley
0dd3bbf6e8 Merge branch 'master' into self-hosted 2017-12-07 14:22:41 -05:00
Andrew Kelley
182cf5b8de translate-c: support macros with pointer casting 2017-12-07 12:27:29 -05:00
Andrew Kelley
dc502042d5 translate-c: refactor prefix and suffix op C macro parsing 2017-12-07 11:52:52 -05:00
Andrew Kelley
37fbf01755 awkward void union field syntax no longer needed 2017-12-06 21:41:38 -05:00
Andrew Kelley
18b8a625f5 upgrade to new args api 2017-12-06 18:22:52 -05:00
Andrew Kelley
7c91a055c1 Merge branch 'master' into self-hosted 2017-12-06 18:20:02 -05:00
Andrew Kelley
62c25af802 add higher level arg-parsing API + misc. changes
* add @noInlineCall - see #640
   This fixes a crash in --release-safe and --release-fast modes
   where the optimizer inlines everything into _start and
   clobbers the command line argument data.
   If we were able to verify that the user's code never reads
   command line args, we could leave off this "no inline"
   attribute.
 * add i29 and u29 primitive types. u29 is the type of alignment,
   so it makes sense to be a primitive.
   probably in the future we'll make any `i` or `u` followed by
   digits into a primitive.
 * add `aligned` functions to Allocator interface
 * add `os.argsAlloc` and `os.argsFree` so that you can get
   a `[]const []u8`, do whatever arg parsing you want, and then free
   it. For now this uses the other API under the hood, but it could
   be reimplemented to do a single allocation.
 * add tests to make sure command line argument parsing works.
2017-12-06 18:12:05 -05:00
Andrew Kelley
04612d25d7 Merge branch 'master' into self-hosted 2017-12-06 14:58:24 -05:00
Andrew Kelley
249cb2aa30 fix regressions from previous commit
c49ee9f632 broke the tests
and this fixes them
2017-12-05 22:39:36 -05:00
Andrew Kelley
f464fe14f4 switch on enum which only has 1 field is comptime known
closes #593
2017-12-05 22:26:17 -05:00
Andrew Kelley
bb6b4f8db2 fix enum with 1 member causing segfault
closes #647
2017-12-05 22:15:33 -05:00
Andrew Kelley
c49ee9f632 allow union and its tag type to peer resolve to the tag type 2017-12-05 21:33:24 -05:00
Andrew Kelley
2715f6fdb8 allow implicit cast from union to its enum tag type
closes #642
2017-12-05 21:10:47 -05:00
Andrew Kelley
b66fb7ceae revert to master branch ir.cpp, fixes issue better than this branch 2017-12-05 20:51:49 -05:00
Andrew Kelley
6018dbd339 Merge branch 'master' into self-hosted 2017-12-05 20:49:03 -05:00
Andrew Kelley
960914a073 add implicit cast from enum to union
when the enum is the tag type of the union and is comptime known
to be of a void field of the union

See #642
2017-12-05 20:46:58 -05:00
Andrew Kelley
63a2f9a8b2 fix casting integer literal to enum 2017-12-05 18:09:22 -05:00
Andrew Kelley
74cea89fce translate-c: fix not printing clang errors 2017-12-05 12:28:59 -05:00
Andrew Kelley
08d531143f parser skeleton 2017-12-05 00:20:23 -05:00
Andrew Kelley
3976981ab3 tokenizing hello world 2017-12-04 23:40:33 -05:00
Andrew Kelley
7297baa9c6 tokenizing basic operators 2017-12-04 23:29:39 -05:00
Andrew Kelley
07898cc0df tokenizing string literals 2017-12-04 23:25:59 -05:00
Andrew Kelley
798dbe487b simple tokenization 2017-12-04 23:09:03 -05:00
Andrew Kelley
31d9dc3539 read a file 2017-12-04 22:05:27 -05:00
Andrew Kelley
fe39ca01bc Merge remote-tracking branch 'origin/master' into llvm6 2017-12-04 17:45:21 -05:00
Andrew Kelley
5ebed1c9ee fix incorrect LLVM IR for union constant when active field is void
found in the llvm6 branch with llvm assertions on
2017-12-04 17:10:46 -05:00
Andrew Kelley
42004f9013 Merge branch 'master' into llvm6 2017-12-04 15:28:17 -05:00
Andrew Kelley
a966275e50 rename builtin.is_big_endian to builtin.endian
See #307
2017-12-04 10:36:31 -05:00
Andrew Kelley
67e6d9bc30 Merge pull request #644 from Dubhead/Dubhead-fix-message-color
Fix the color of compiler messages for light-themed terminal.
2017-12-04 09:15:17 -05:00
MIURA Masahiro
fea016afc0 Fix the color of compiler messages for light-themed terminal. 2017-12-04 19:22:34 +09:00
Andrew Kelley
76f3bdfff8 add test for casting union to tag type of union 2017-12-04 02:12:13 -05:00
Andrew Kelley
dd3437d5ba fix build on windows 2017-12-04 02:08:26 -05:00
Andrew Kelley
54138d9e82 add test for union with 1 void field being 0 bits 2017-12-04 02:05:33 -05:00
Andrew Kelley
084911d9b3 add test for @sizeOf on extern and packed unions 2017-12-04 02:04:08 -05:00
Andrew Kelley
942b250895 update docs regarding enums and unions 2017-12-04 01:43:06 -05:00
Andrew Kelley
05d9f07541 more tests for unions
See #618
2017-12-04 00:56:27 -05:00
Andrew Kelley
fce435db26 fix abi alignment of union-enums not counting tag type
add more tests for unions

See #618
2017-12-04 00:32:12 -05:00
Andrew Kelley
5a8367e892 rename @EnumTagType to @TagType. add tests for union-enums
See #618
2017-12-03 22:36:01 -05:00
Andrew Kelley
0ad1239522 rework enums and unions and their relationship to each other
* @enumTagName renamed to @tagName and it works on enums and
   union-enums
 * Remove the EnumTag type. Now there is only enum and union,
   and the tag type of a union is always an enum.
 * unions support specifying the tag enum type, and they support
   inferring an enum tag type.
 * Enums no longer support field types but they do support
   setting the tag values. Likewise union-enums when inferring
   an enum tag type support setting the tag values.
 * It is now an error for enums and unions to have 0 fields.
 * switch statements support union-enums

closes #618
2017-12-03 20:43:56 -05:00
Andrew Kelley
137c8f5e8a ability to set tag values of enums
also remove support for enums with 0 values

closes #305
2017-12-02 22:32:39 -05:00
Andrew Kelley
98237f7c0b casting between integer and enum only works via tag type
See #305
2017-12-02 17:12:37 -05:00
Josh Wolfe
54a0db0daf todo: fix #639 2017-12-01 19:54:01 -07:00
Josh Wolfe
67b8b00c44 implement insertion sort. something's broken 2017-12-01 16:11:39 -07:00
Andrew Kelley
921825b4c0 Merge branch 'llvm5.0.1' 2017-12-01 13:51:53 -05:00
Andrew Kelley
cf96b6f87b update to LLVM 5.0.1rc2 2017-12-01 13:44:28 -05:00
Andrew Kelley
bdd5241615 update c_headers to llvm 5.0.1rc2 2017-12-01 12:15:19 -05:00
Andrew Kelley
a206ef34bb LLD patch: Fix the ASM code generated for __stub_helpers section
This applies 93ca847862af07632197dcf2d8a68b9b27a26d7a
from the llvm-project git monorepo to the embedded LLD.
2017-12-01 12:11:55 -05:00
Andrew Kelley
ddca67a2b9 LLD patch: workaround for buggy MACH-O code
This reapplies 1a1414fc42
to the embedded LLD.
2017-12-01 12:09:55 -05:00
Andrew Kelley
fa45407e78 LLD patch: Fix for LLD on linker scripts with empty sections
This reapplies 569cf286ff
to the embedded LLD.
2017-12-01 12:08:16 -05:00
Andrew Kelley
9ea23272fa LLD patch: COFF: better behavior when using as a library
This applies de776439b61fb71c1256ad86238799c758c66048
from the LLVM git monorepo to the embedded LLD.
2017-12-01 12:06:33 -05:00
Andrew Kelley
77b530b50a updated embedded LLD to 5.0.1rc2 2017-12-01 11:59:14 -05:00
Andrew Kelley
b4120423a5 translate-c: only emit enum tag type if not c_int or c_uint 2017-12-01 00:37:15 -05:00
Andrew Kelley
264c86853b packed structs can have enums with explicit tag types
See #305
2017-12-01 00:34:29 -05:00
Andrew Kelley
b62e2fd870 ability to specify tag type of enums
see #305
2017-11-30 22:08:11 -05:00
Josh Wolfe
5786df933d add mem.readIntLE and readIntBE 2017-11-30 11:20:50 -07:00
Andrew Kelley
210d0017c4 fix build broken by previous commit
now we report a compile error for unusual failures from translate-c
2017-11-29 23:09:35 -05:00
Andrew Kelley
7729f6cf4e translate-c: support static incomplete array inside function 2017-11-29 21:50:38 -05:00
Andrew Kelley
716b0b8655 fix capturing value of switch with all unreachable prongs
closes #635
2017-11-29 21:34:17 -05:00
Andrew Kelley
ccea8dcbf6 better error code for File.getEndPos failure 2017-11-29 21:34:17 -05:00
Josh Wolfe
88a7f203f9 add Buffer.appendFormat() 2017-11-29 19:31:09 -07:00
Josh Wolfe
418b0967fc fix os.Dir compile errors 2017-11-29 17:52:58 -07:00
Andrew Kelley
afe3aae582 Merge remote-tracking branch 'origin/llvm6' into llvm6 2017-11-29 19:12:55 -05:00
Andrew Kelley
d4cd4a35d5 update fast math llvm API to latest 2017-11-29 19:11:34 -05:00
Andrew Kelley
91ef68f9b1 Merge remote-tracking branch 'origin/master' into llvm6 2017-11-29 16:34:50 -05:00
Andrew Kelley
7066283004 translate-c: support const ptr initializer 2017-11-28 23:44:45 -05:00
Andrew Kelley
26096e79d1 translate-c: fix clobbering primitive types 2017-11-28 03:17:28 -05:00
Andrew Kelley
8d5c4a67a7 Merge branch 'dimenus-c-field-expr' 2017-11-28 03:00:13 -05:00
Andrew Kelley
e745544dac translate-c: detect macros referencing field lookup
as fn calls which assert the fn ptr is non-null
2017-11-28 02:58:51 -05:00
Andrew Kelley
f537c51f25 Merge branch 'c-field-expr' of https://github.com/dimenus/zig into dimenus-c-field-expr 2017-11-28 00:44:16 -05:00
Andrew Kelley
1ab84a27d3 translate-c: fix sometimes getting (no file) warnings
Thanks to Mason Remaley for testing the fix.
2017-11-28 00:32:32 -05:00
Mason Remaley
3e8fd24547 Implements translation for the prefix not operator (#628) 2017-11-27 21:00:05 -05:00
Ryan Saunderson
57049b95b3 Resolving merge w/ upstream master 2017-11-27 11:42:48 -06:00
dimenus
04472f57be Added support for exporting of C field expressions 2017-11-27 11:23:14 -06:00
Andrew Kelley
671183fa9a translate-c: support pointer casting
also avoid some unnecessary casts
2017-11-26 20:05:55 -05:00
Andrew Kelley
93fac5f257 translate-c: support variable name shadowing 2017-11-26 17:30:43 -05:00
Andrew Kelley
9a8545d590 translate-c: fix translation when no default switch case 2017-11-26 16:03:56 -05:00
Andrew Kelley
aa2ca3f02c translate-c: better way to translate switch
previously `continue` would be handled incorrectly
2017-11-26 15:58:49 -05:00
Andrew Kelley
1b0e90f70b translate-c supports switch statements 2017-11-26 00:58:11 -05:00
Andrew Kelley
687e359291 translate-c: avoid global state and introduce var decl scopes
in preparation to implement switch and solve variable name collisions
2017-11-25 22:17:24 -05:00
Andrew Kelley
df0e875856 translate-c: introduce the concept of scopes
in preparation to implement switch and solve variable name collisions
2017-11-25 20:34:05 -05:00
Andrew Kelley
a2afcae9ff fix crash when constant inside comptime function has compile error
closes #625
2017-11-25 18:16:33 -05:00
Andrew Kelley
48ebb65cc7 add an assert to catch corrupted memory 2017-11-25 16:34:08 -05:00
Andrew Kelley
b390929826 translate-c supports break and continue 2017-11-25 11:56:17 -05:00
Andrew Kelley
bf20b260ce translate-c supports for loops 2017-11-25 00:57:48 -05:00
Andrew Kelley
18eb3c5f90 translate-c supports returning void 2017-11-25 00:25:47 -05:00
Andrew Kelley
cd36baf530 fix assertion failed when invalid type encountered 2017-11-24 22:04:24 -05:00
Andrew Kelley
40480c7cdc translate-c supports string literals 2017-11-24 19:26:05 -05:00
Andrew Kelley
68312afcdf translate-c: support pre increment and decrement operators 2017-11-24 16:36:39 -05:00
Andrew Kelley
741504862c update homepage docs 2017-11-24 15:06:12 -05:00
Andrew Kelley
5a25505668 rename "parsec" to "translate-c" 2017-11-24 14:56:05 -05:00
Josh Wolfe
afbbdb2c67 move base64 functions into structs 2017-11-20 23:26:45 -07:00
Josh Wolfe
a44283b0b2 rework std.base64 api
* rename decode to decodeExactUnsafe.
* add decodeExact, which checks for invalid chars and padding.
* add decodeWithIgnore, which also allows ignoring chars.
* alphabets are supplied to the decoders with their
  char-to-index mapping already built, which enables it to be
  done at comptime.
* all decode/encode apis except decodeWithIgnore require dest
  to be the exactly correct length. This is calculated by a
  calc function corresponding to each api. These apis no longer
  return the dest parameter.
* for decodeWithIgnore, an exact size cannot be known a priori.
  Instead, a calc function gives an upperbound, and a runtime
  error is returned in case of overflow. decodeWithIgnore
  returns the number of bytes written to dest.

closes #611
2017-11-20 23:26:45 -07:00
Andrew Kelley
339d48ac15 parse-c: support address of operator 2017-11-17 12:11:03 -05:00
Andrew Kelley
3e835973db Merge pull request #617 from dimenus/dll-load
Added DLL loading capability in windows to the std lib.
2017-11-17 10:24:34 -05:00
Andrew Kelley
b50c676f76 add parse-c support for unions 2017-11-16 23:54:33 -05:00
dimenus
a7d07d412c Added DLL loading capability in windows to the std lib. 2017-11-16 21:49:05 -06:00
Andrew Kelley
d108689382 Merge branch 'unions'
closes #144
2017-11-16 22:14:50 -05:00
Andrew Kelley
1473eb9ae0 add documentation placeholders for unions 2017-11-16 22:13:20 -05:00
Andrew Kelley
5d2ba056c8 fix codegen for union init with runtime value
see #144
2017-11-16 22:06:08 -05:00
Andrew Kelley
e26ccd5166 debug safety for unions 2017-11-16 21:15:15 -05:00
Andrew Kelley
f12d36641f union secret field is the tag index instead of distinct type index
See #144
2017-11-16 10:06:58 -05:00
Andrew Kelley
018cbff438 unions have a secret field for the type
See #144
2017-11-15 22:52:47 -05:00
Andrew Kelley
3740bfa3bf update fast math flags for latest llvm 2017-11-15 22:32:57 -05:00
Andrew Kelley
a984040fae Merge remote-tracking branch 'origin/master' into llvm6 2017-11-15 22:32:23 -05:00
Andrew Kelley
9a4da6c8d8 Merge branch 'master' into llvm6 2017-11-15 22:24:42 -05:00
Andrew Kelley
f276fd0f37 basic union support
See #144
2017-11-15 13:04:18 -05:00
Andrew Kelley
7a74dbadd7 add docs for std.base64 2017-11-14 17:58:58 -05:00
Ryan Saunderson
371e578151 Merge remote-tracking branch 'upstream/master' into llvm6 2017-11-14 07:00:27 -06:00
Andrew Kelley
5029322aa1 c-to-zig: handle UO_Deref 2017-11-14 02:10:13 -05:00
Josh Wolfe
6ffaf4c2e2 parsec supports do loop 2017-11-13 22:56:20 -07:00
Josh Wolfe
012ce1481e parsec supports post increment/decrement with used result 2017-11-13 22:19:51 -07:00
Josh Wolfe
4c2cdf6f4d parsec supports more compound assign operators 2017-11-13 21:37:30 -07:00
Josh Wolfe
c1fde0e8c4 parsec supports bitshift operators 2017-11-13 20:49:53 -07:00
Andrew Kelley
6356724057 Merge branch 'dimenus-parsec' 2017-11-13 22:33:58 -05:00
Andrew Kelley
03732860be add test case for previous commit 2017-11-13 22:33:41 -05:00
Andrew Kelley
df07361642 Merge branch 'parsec' of https://github.com/dimenus/zig into dimenus-parsec 2017-11-13 22:26:31 -05:00
Josh Wolfe
57cd074959 parsec supports C comma operator 2017-11-13 19:59:32 -07:00
Josh Wolfe
1f28fcdec5 parsec supports C NULL to pointer implicit cast 2017-11-13 19:39:46 -07:00
dimenus
b3b4786c24 Fixed duplicate decl detection for typedefs/enums 2017-11-13 20:10:36 -06:00
dimenus
98e3c7911c Fixed duplicate decl detection for typedefs/enums 2017-11-13 16:37:46 -06:00
Andrew Kelley
a890380b6a fix windows trying to run linux-only tests 2017-11-10 18:29:49 -05:00
Andrew Kelley
ca87f55a7b Merge branch 'bscheinman-linux_timer' 2017-11-10 18:25:32 -05:00
Andrew Kelley
5ae53dacfb rename test 2017-11-10 18:24:52 -05:00
Andrew Kelley
5895204c99 Merge branch 'linux_timer' of https://github.com/bscheinman/zig into bscheinman-linux_timer 2017-11-10 18:18:03 -05:00
Brendon Scheinman
87407b54b6 add epoll and timerfd support on linux 2017-11-10 15:12:46 -08:00
Andrew Kelley
1403748fd8 disable broken 32 bit windows test
See #537
2017-11-10 17:08:11 -05:00
Andrew Kelley
df89291d1c Merge remote-tracking branch 'origin/master' into llvm6 2017-11-10 16:45:01 -05:00
Andrew Kelley
019f18058b fix test failures
put all the codegen for fn prototypes to the same place
2017-11-10 16:32:37 -05:00
Andrew Kelley
403a46abcc fix test failure on 32 bit windows 2017-11-10 16:03:14 -05:00
Andrew Kelley
6bf1547148 Merge branch 'darwin-stat'
closes #606
2017-11-10 15:01:09 -05:00
Andrew Kelley
029d37d6a7 fix bug when multiple function definitions exist
This might be related to #529
2017-11-10 14:58:50 -05:00
Andrew Kelley
20c2dbdbd3 add windows implementation of io.File.getEndPos 2017-11-10 14:36:03 -05:00
Andrew Kelley
1ac46fac15 add a std lib test for reading and writing files
* fix fstat wrong on darwin
 * move std.debug.global_allocator to std.debug.global_allocator_state and make it private
 * add std.debug.global_allocator as a pointer (to upgrade your zig code remove
   the '&')
2017-11-10 14:17:23 -05:00
dimenus
e9d7623e1f Merge remote-tracking branch 'origin/master' into llvm6 2017-11-10 09:49:45 -06:00
Jeff Fowler
336d81894d Fix Stat include in darwin land (#605) 2017-11-09 13:46:53 -05:00
Jeff Fowler
52521d5f67 fix typo on darwin lseek (#602) 2017-11-09 11:35:35 -05:00
Andrew Kelley
7ea669e04c fix parameter of extern var args not type checked
closes #601
2017-11-09 11:30:39 -05:00
Andrew Kelley
4f8c26d2c6 fix enum sizes too large
closes #598
2017-11-08 21:44:10 -05:00
Andrew Kelley
53b18c8542 fix travis linux script 2017-11-07 09:06:29 -05:00
Andrew Kelley
4543413491 std.io: introduce buffered I/O and change API
I started working on #465 and made some corresponding std.io
API changes.

New structs:
 * std.io.FileInStream
 * std.io.FileOutStream
 * std.io.BufferedOutStream
 * std.io.BufferedInStream

Removed:
 * std.io.File.in_stream
 * std.io.File.out_stream

Now instead of &file.out_stream or &file.in_stream to get access to
the stream API for a file, you get it like this:

var file_in_stream = io.FileInStream.init(&file);
const in_stream = &file_in_stream.stream;

var file_out_stream = io.FileOutStream.init(&file);
const out_stream = &file_out_stream.stream;

This is evidence that we might not need any OOP features -
See #130.
2017-11-07 03:22:27 -05:00
Andrew Kelley
3a600297ca Merge remote-tracking branch 'origin/master' into llvm6 2017-11-06 22:41:12 -05:00
Andrew Kelley
634e8713c3 add @memberType and @memberName builtin functions
see #383

there is a plan to unify most of the reflection into 2
builtin functions, as outlined in the above issue,
but this gives us needed features for now, and we can
iterate on the design in future commits
2017-11-06 22:07:19 -05:00
scurest
f0dafd3f20 fix typos in std.io (#589)
Fixes a bug that prevented InStream.realAllAlloc from compiling.
2017-11-06 11:40:58 -05:00
Andrew Kelley
52a2992862 Merge pull request #587 from scurest/c_alloc_redeclaration_of_mem
Fix #585
2017-11-05 19:38:50 -05:00
scurest
48c8181886 fix redeclaration of mem (#585) 2017-11-05 15:46:54 -06:00
scurest
bd6f8d99c5 add test for c_allocator 2017-11-05 15:46:10 -06:00
Andrew Kelley
4cc9fe90a8 fix build on MacOS 2017-11-04 16:40:55 -04:00
Andrew Kelley
f0d755153d add compile-time reflection for function arg types
See #383
2017-11-04 16:20:02 -04:00
Andrew Kelley
4a6df04f75 slightly more verbose error message when building object file fails 2017-11-03 20:07:32 -04:00
Andrew Kelley
75afe73c66 Merge pull request #581 from Dimenus/line_endings
Add support for windows line endings with c macros within a c_import.
2017-11-03 18:40:38 -04:00
Andrew Kelley
d4c1ed95ac Merge pull request #583 from Dimenus/libc_runtime
Win32 libc runtime fixes.
2017-11-03 18:32:03 -04:00
dimenus
1890760206 Windows libc & static libc are located in the same dir which is already covered by msvc_lib_dir 2017-11-03 17:09:35 -05:00
dimenus
1ef6cb1b64 Add support for windows line endings with c macros 2017-11-03 16:29:49 -05:00
Marc Tiehuis
795703a39c Add emit command-line option (#580)
Add emit command-line option
2017-11-03 09:09:33 -04:00
Andrew Kelley
a31b23c46b more compile-time type reflection
See #383
2017-11-03 00:00:57 -04:00
Andrew Kelley
dc8b011d61 fix incorrect debug info for empty structs
closes #579

now all tests pass for llvm master branch
2017-11-02 21:57:55 -04:00
Andrew Kelley
4a82c2d124 fix incorrect debug info for empty structs
now all tests pass for llvm master branch
2017-11-02 21:54:24 -04:00
Andrew Kelley
188fd47a51 add missing environment 2017-11-02 21:54:24 -04:00
Andrew Kelley
9a99bd3a71 use llvm named structs for const values when possible
normally we want to use llvm types for constants. but
union constants (which are found inside enums) when
they are initialized with the non-most-aligned-member
must be unnamed structs.

these bubble up to all aggregate types. if a constant of
an aggregate type contains, recursively, a union constant
with a non-most-aligned-member initialized, the aggregate
typed constant must be unnamed too.

this fixes all the asserts that were coming in from
llvm master branch.
2017-11-02 21:54:24 -04:00
Andrew Kelley
94ec2190f8 update to llvm master 2017-11-02 21:54:24 -04:00
Andrew Kelley
abff1b6884 windows: use the same libc search within a compilation unit 2017-11-01 23:08:34 -04:00
Andrew Kelley
f7837f445e bump build_runner allocator to use 30 MB 2017-11-01 16:46:10 -04:00
Dimenus
38f05d4ac5 WIN32: Linking with the CRT at runtime. (#570)
Disclaimer: Forgive me if my format sucks, I've never submitted a PR before!

Fixes: #517 

I added a few things to allow zig to link with the CRT properly both statically and dynamically. In Visual Studio 2017, Microsoft changed how the c-runtime is factored again. With this change, they also added a COM interface to allow you to query the respective Visual Studio instance for two of them. This does that and also falls back on a registry query for 2015 support. If you're using a Visual Studio instance older than 2015, you'll have to use the existing options available with the zig compiler. Changes are listed below along with a general description of the changes.

all_types.cpp:

The separate variables for msvc/kern32 have been removed and all win32 libc directory paths have been combined into a ZigList since we're querying more than two directories and differentiating one from another doesn't matter to lld.

analyze.cpp:

The existing functions were extended to support querying libc libs & libc headers at runtime.

codegen.cpp/hpp:

Microsoft uses the new 'Universal C Runtime' name now. Doesn't matter from a functionality standpoint. I left the compiler switches as is to not introduce any breaking changes.

link.cpp:

We're linking 4 libs and generating another in order to support the UCRT.
Dynamic: msvcrt/d, vcruntime/d, ucrt/d, legacy_stdio_definitions.lib
Static: libcmt/d, libvcruntime/d libucrt/d, legacy_stdio_definitions.lib

main.cpp:

Update function call names.

os.cpp/hpp:

COM/Registry interface for querying Windows UCRT/SDK.

Sources:
[Windows CRT](https://docs.microsoft.com/en-us/cpp/c-runtime-library/crt-library-features)
[VS 2015 Breaking Changes](https://msdn.microsoft.com/en-us/library/bb531344.aspx)
2017-11-01 15:33:14 -04:00
Andreas Haferburg
b35689b70d Enforce "\n" line endings on Windows (#574)
With Windows line endings, which seems to be the default on Windows, the
zig compiler won't understand std out of the box. This project should
not rely on git's global core.autocrlf setting.
2017-11-01 10:31:32 -04:00
Andrew Kelley
25972be45c fix windows build from previous commit 2017-10-31 22:24:02 -04:00
Andrew Kelley
9e234d4208 breaking change to std.io API
* Merge io.InStream and io.OutStream into io.File
 * Introduce io.OutStream and io.InStream interfaces
   - io.File implements both of these
 * Move mem.IncrementingAllocator to heap.IncrementingAllocator

Instead of:

```
%return std.io.stderr.printf("hello\n");
```

now do:

```
std.debug.warn("hello\n");
```

To print to stdout, see `io.getStdOut()`.

 * Rename std.ArrayList.resizeDown to std.ArrayList.shrink.
2017-10-31 04:47:55 -04:00
Andrew Kelley
7a96aca39e Merge branch 'master' into self-hosted 2017-10-27 12:54:46 -04:00
Andrew Kelley
1a414c7b6b delete -municode command line argument
The solution to this is to always have it on and only
use the 'W' versions of respective windows APIs.

See the issue for this.
2017-10-27 01:29:58 -04:00
Andrew Kelley
540bac0928 Merge branch 'master' into self-hosted 2017-10-27 01:28:08 -04:00
Andrew Kelley
4c306af4eb add test case for previous commit 2017-10-27 01:22:48 -04:00
Andrew Kelley
f1072d0d9f use llvm named structs for const values when possible
normally we want to use llvm types for constants. but
union constants (which are found inside enums) when
they are initialized with the non-most-aligned-member
must be unnamed structs.

these bubble up to all aggregate types. if a constant of
an aggregate type contains, recursively, a union constant
with a non-most-aligned-member initialized, the aggregate
typed constant must be unnamed too.

this fixes some of the asserts that were coming in from
llvm master branch.
2017-10-27 00:14:56 -04:00
Marc Tiehuis
6663638195 Improve invalid character error messages (#566)
See #544
2017-10-26 10:00:23 -04:00
Andrew Kelley
f4ca3482f1 add guard to c_headers for duplicate va_list on darwin 2017-10-26 01:11:57 -04:00
Andrew Kelley
c7053bea20 better output when @cImport generates invalid zig 2017-10-26 00:32:30 -04:00
Andrew Kelley
300c83d893 fix crash on field access of opaque type 2017-10-25 23:18:18 -04:00
Andrew Kelley
5f28a9d238 cleaner verbose flags and zig build prints failed command 2017-10-25 23:10:41 -04:00
Andrew Kelley
6764a45223 Merge branch 'better-float-printing' 2017-10-24 21:58:09 -04:00
Andrew Kelley
73fe5f63c6 add some sanity tests for float printing 2017-10-24 21:57:58 -04:00
Andrew Kelley
1e784839f1 Merge branch 'float-printing' of https://github.com/scurest/zig into better-float-printing 2017-10-24 21:44:49 -04:00
Andrew Kelley
1828f8eb8e fix missing compiler_rt in release modes
the optimizer was deleting compiler_rt symbols, so I changed
the linkage type from LinkOnce to Weak

also changed LinkOnce to mean linkonce_odr in llvm and
Weak to mean weak_odr in llvm.

See #563
2017-10-24 21:31:47 -04:00
scurest
262b7428cf More corrections to float printing
Testing suggests all f32s are now printed accurately.
2017-10-24 14:18:50 -05:00
Andrew Kelley
4f4da3c10c wip self hosted code 2017-10-24 10:08:20 -04:00
Andrew Kelley
d7e28f991d remove CXX ABI workaround
the actual solution is you must compile zig with the same
compiler that compiled llvm, lld, and clang.

reverts 8d60ffe314
2017-10-23 22:37:59 -04:00
Andrew Kelley
643ab90ace add maximum value for @setAlignStack 2017-10-23 22:33:00 -04:00
scurest
03a0dfbeca Print better floats 2017-10-23 15:40:49 -05:00
Andrew Kelley
92751d5e24 self hosted zig: print usage 2017-10-21 17:31:06 -04:00
Andrew Kelley
c1642355f0 parse-c: improve performance
previously we did linear search to find existing global
declarations; now we index using a hash map.

building tetris went from taking 5.3 sec to 0.76 sec
2017-10-21 16:46:33 -04:00
Andrew Kelley
a1af7cbf00 report compile error instead of crashing for void in var args
See #557
2017-10-21 15:46:04 -04:00
Andrew Kelley
175893913d fix compiler crash regarding type name of undefined
See #547
2017-10-21 13:14:10 -04:00
Andrew Kelley
9b91c76088 std.fmt.format supports ints smaller than u8
closes #546

thanks to @Dimenus for the fix
2017-10-21 13:03:08 -04:00
Andrew Kelley
b3d12d2c9e zig build: fix system libraries not respected for C artifacts
closes #550
2017-10-21 12:58:47 -04:00
Andrew Kelley
3c3af4b332 fix docs link 2017-10-17 16:05:46 -04:00
Andrew Kelley
a27c0dd591 remove unsupported targets from readme
See #438
2017-10-17 14:15:50 -04:00
Andrew Kelley
78cb4ce030 Release 0.1.1 2017-10-17 08:50:00 -04:00
Andrew Kelley
79193ffed2 build: fix logic for version when there is a git tag 2017-10-17 08:47:27 -04:00
1621 changed files with 97740 additions and 31862 deletions

1
.gitattributes vendored Normal file
View File

@@ -0,0 +1 @@
*.zig text eol=lf

File diff suppressed because it is too large Load Diff

105
README.md
View File

@@ -5,8 +5,6 @@ clarity.
[ziglang.org](http://ziglang.org)
[Documentation](http://ziglang.org/documentation/)
## Feature Highlights
* Small, simple language. Focus on debugging your application rather than
@@ -26,7 +24,7 @@ clarity.
always compiled against statically in source form. Compile units do not
depend on libc unless explicitly linked.
* Nullable type instead of null pointers.
* Tagged union type instead of raw unions.
* Safe unions, tagged unions, and C ABI compatible unions.
* Generics so that one can write efficient data structures that work for any
data type.
* No header files required. Top level declarations are entirely
@@ -35,7 +33,7 @@ clarity.
* Partial compile-time function evaluation with eliminates the need for
a preprocessor or macros.
* The binaries produced by Zig have complete debugging information so you can,
for example, use GDB to debug your software.
for example, use GDB or MSVC to debug your software.
* Built-in unit tests with `zig test`.
* Friendly toward package maintainers. Reproducible build, bootstrapping
process carefully documented. Issues filed by package maintainers are
@@ -54,35 +52,21 @@ that counts as "freestanding" for the purposes of this table.
| | freestanding | linux | macosx | windows | other |
|-------------|--------------|---------|---------|---------|---------|
|i386 | OK | planned | OK | OK | planned |
|i386 | OK | planned | OK | planned | planned |
|x86_64 | OK | OK | OK | OK | planned |
|arm | OK | planned | planned | N/A | planned |
|aarch64 | OK | planned | planned | planned | planned |
|avr | OK | planned | planned | N/A | planned |
|bpf | OK | planned | planned | N/A | planned |
|hexagon | OK | planned | planned | N/A | planned |
|mips | OK | planned | planned | N/A | planned |
|msp430 | OK | planned | planned | N/A | planned |
|nios2 | OK | planned | planned | N/A | planned |
|powerpc | OK | planned | planned | N/A | planned |
|r600 | OK | planned | planned | N/A | planned |
|amdgcn | OK | planned | planned | N/A | planned |
|riscv | OK | planned | planned | N/A | planned |
|sparc | OK | planned | planned | N/A | planned |
|s390x | OK | planned | planned | N/A | planned |
|tce | OK | planned | planned | N/A | planned |
|thumb | OK | planned | planned | N/A | planned |
|xcore | OK | planned | planned | N/A | planned |
|nvptx | OK | planned | planned | N/A | planned |
|le | OK | planned | planned | N/A | planned |
|amdil | OK | planned | planned | N/A | planned |
|hsail | OK | planned | planned | N/A | planned |
|spir | OK | planned | planned | N/A | planned |
|kalimba | OK | planned | planned | N/A | planned |
|shave | OK | planned | planned | N/A | planned |
|lanai | OK | planned | planned | N/A | planned |
|wasm | OK | N/A | N/A | N/A | N/A |
|renderscript | OK | N/A | N/A | N/A | N/A |
## Community
@@ -92,10 +76,10 @@ that counts as "freestanding" for the purposes of this table.
### Wanted: Windows Developers
Help get the tests passing on Windows, flesh out the standard library for
Windows, streamline Zig installation and distribution for Windows. Work with
LLVM and LLD teams to improve PDB/CodeView/MSVC debugging. Implement stack traces
for Windows in the MinGW environment and the MSVC environment.
Flesh out the standard library for Windows, streamline Zig installation and
distribution for Windows. Work with LLVM and LLD teams to improve
PDB/CodeView/MSVC debugging. Implement stack traces for Windows in the MinGW
environment and the MSVC environment.
### Wanted: MacOS and iOS Developers
@@ -133,31 +117,26 @@ libc. Create demo games using Zig.
[![Build Status](https://travis-ci.org/zig-lang/zig.svg?branch=master)](https://travis-ci.org/zig-lang/zig)
[![Build status](https://ci.appveyor.com/api/projects/status/4t80mk2dmucrc38i/branch/master?svg=true)](https://ci.appveyor.com/project/andrewrk/zig-d3l86/branch/master)
### Dependencies
### Stage 1: Build Zig from C++ Source Code
#### Build Dependencies
These compile tools must be available on your system and are used to build
the Zig compiler itself:
#### Dependencies
##### POSIX
* gcc >= 5.0.0 or clang >= 3.6.0
* cmake >= 2.8.5
* gcc >= 5.0.0 or clang >= 3.6.0
* LLVM, Clang, LLD development libraries == 6.x, compiled with the same gcc or clang version above
- These depend on zlib and libxml2.
##### Windows
* cmake >= 2.8.5
* Microsoft Visual Studio 2015
* LLVM, Clang, LLD development libraries == 6.x, compiled with the same MSVC version above
#### Library Dependencies
#### Instructions
These libraries must be installed on your system, with the development files
available. The Zig compiler links against them. You have to use the same
compiler for these libraries as you do to compile Zig.
* LLVM, Clang, and LLD libraries == 5.x
### Debug / Development Build
##### POSIX
If you have gcc or clang installed, you can find out what `ZIG_LIBC_LIB_DIR`,
`ZIG_LIBC_STATIC_LIB_DIR`, and `ZIG_LIBC_INCLUDE_DIR` should be set to
@@ -172,55 +151,51 @@ make install
./zig build --build-file ../build.zig test
```
#### MacOS
##### MacOS
`ZIG_LIBC_LIB_DIR` and `ZIG_LIBC_STATIC_LIB_DIR` are unused.
```
brew install llvm@5
brew outdated llvm@5 || brew upgrade llvm@5
brew install cmake llvm@6
brew outdated llvm@6 || brew upgrade llvm@6
mkdir build
cd build
cmake .. -DCMAKE_PREFIX_PATH=/usr/local/opt/llvm@5/ -DCMAKE_INSTALL_PREFIX=$(pwd)
cmake .. -DCMAKE_PREFIX_PATH=/usr/local/opt/llvm@6/ -DCMAKE_INSTALL_PREFIX=$(pwd)
make install
./zig build --build-file ../build.zig test
```
#### Windows
##### Windows
See https://github.com/zig-lang/zig/wiki/Building-Zig-on-Windows
### Release / Install Build
### Stage 2: Build Self-Hosted Zig from Zig Source Code
Once installed, `ZIG_LIBC_LIB_DIR` and `ZIG_LIBC_INCLUDE_DIR` can be overridden
by the `--libc-lib-dir` and `--libc-include-dir` parameters to the zig binary.
*Note: Stage 2 compiler is not complete. Beta users of Zig should use the
Stage 1 compiler for now.*
Dependencies are the same as Stage 1, except now you have a working zig compiler.
```
mkdir build
cd build
cmake .. -DCMAKE_BUILD_TYPE=Release -DZIG_LIBC_LIB_DIR=/some/path -DZIG_LIBC_INCLUDE_DIR=/some/path -DZIG_LIBC_STATIC_INCLUDE_DIR=/some/path
make
sudo make install
bin/zig build --build-file ../build.zig --prefix $(pwd)/stage2 install
```
### Test Coverage
This produces `./stage2/bin/zig` which can be used for testing and development.
Once it is feature complete, it will be used to build stage 3 - the final compiler
binary.
To see test coverage in Zig, configure with `-DZIG_TEST_COVERAGE=ON` as an
additional parameter to the Debug build.
### Stage 3: Rebuild Self-Hosted Zig Using the Self-Hosted Compiler
You must have `lcov` installed and available.
This is the actual compiler binary that we will install to the system.
Then `make coverage`.
#### Debug / Development Build
With GCC you will get a nice HTML view of the coverage data. With clang,
the last step will fail, but you can execute
`llvm-cov gcov $(find CMakeFiles/ -name "*.gcda")` and then inspect the
produced .gcov files.
```
./stage2/bin/zig build --build-file ../build.zig --prefix $(pwd)/stage3 install
```
### Related Projects
#### Release / Install Build
* [zig-mode](https://github.com/AndreaOrru/zig-mode) - Emacs integration
* [zig.vim](https://github.com/zig-lang/zig.vim) - Vim configuration files
* [vscode-zig](https://github.com/zig-lang/vscode-zig) - Visual Studio Code extension
* [zig-compiler-completions](https://github.com/tiehuis/zig-compiler-completions) - bash and zsh completions for the zig compiler
* [NppExtension](https://github.com/ice1000/NppExtension) - Notepad++ syntax highlighting
```
./stage2/bin/zig build --build-file ../build.zig install -Drelease-fast
```

216
build.zig
View File

@@ -1,10 +1,102 @@
const Builder = @import("std").build.Builder;
const builtin = @import("builtin");
const std = @import("std");
const Builder = std.build.Builder;
const tests = @import("test/tests.zig");
const os = std.os;
const BufMap = std.BufMap;
const warn = std.debug.warn;
const mem = std.mem;
const ArrayList = std.ArrayList;
const Buffer = std.Buffer;
const io = std.io;
pub fn build(b: &Builder) !void {
const mode = b.standardReleaseOptions();
var docgen_exe = b.addExecutable("docgen", "doc/docgen.zig");
const rel_zig_exe = try os.path.relative(b.allocator, b.build_root, b.zig_exe);
var docgen_cmd = b.addCommand(null, b.env_map, [][]const u8 {
docgen_exe.getOutputPath(),
rel_zig_exe,
"doc/langref.html.in",
os.path.join(b.allocator, b.cache_root, "langref.html") catch unreachable,
});
docgen_cmd.step.dependOn(&docgen_exe.step);
const docs_step = b.step("docs", "Build documentation");
docs_step.dependOn(&docgen_cmd.step);
const test_step = b.step("test", "Run all the tests");
// find the stage0 build artifacts because we're going to re-use config.h and zig_cpp library
const build_info = try b.exec([][]const u8{b.zig_exe, "BUILD_INFO"});
var index: usize = 0;
const cmake_binary_dir = nextValue(&index, build_info);
const cxx_compiler = nextValue(&index, build_info);
const llvm_config_exe = nextValue(&index, build_info);
const lld_include_dir = nextValue(&index, build_info);
const lld_libraries = nextValue(&index, build_info);
const std_files = nextValue(&index, build_info);
const c_header_files = nextValue(&index, build_info);
const dia_guids_lib = nextValue(&index, build_info);
const llvm = findLLVM(b, llvm_config_exe) catch unreachable;
var exe = b.addExecutable("zig", "src-self-hosted/main.zig");
exe.setBuildMode(mode);
exe.addIncludeDir("src");
exe.addIncludeDir(cmake_binary_dir);
addCppLib(b, exe, cmake_binary_dir, "zig_cpp");
if (lld_include_dir.len != 0) {
exe.addIncludeDir(lld_include_dir);
var it = mem.split(lld_libraries, ";");
while (it.next()) |lib| {
exe.addObjectFile(lib);
}
} else {
addCppLib(b, exe, cmake_binary_dir, "embedded_lld_elf");
addCppLib(b, exe, cmake_binary_dir, "embedded_lld_coff");
addCppLib(b, exe, cmake_binary_dir, "embedded_lld_lib");
}
dependOnLib(exe, llvm);
if (exe.target.getOs() == builtin.Os.linux) {
const libstdcxx_path_padded = try b.exec([][]const u8{cxx_compiler, "-print-file-name=libstdc++.a"});
const libstdcxx_path = ??mem.split(libstdcxx_path_padded, "\r\n").next();
exe.addObjectFile(libstdcxx_path);
exe.linkSystemLibrary("pthread");
} else if (exe.target.isDarwin()) {
exe.linkSystemLibrary("c++");
}
if (dia_guids_lib.len != 0) {
exe.addObjectFile(dia_guids_lib);
}
if (exe.target.getOs() != builtin.Os.windows) {
exe.linkSystemLibrary("xml2");
}
exe.linkSystemLibrary("c");
b.default_step.dependOn(&exe.step);
const skip_self_hosted = b.option(bool, "skip-self-hosted", "Main test suite skips building self hosted compiler") ?? false;
if (!skip_self_hosted) {
test_step.dependOn(&exe.step);
}
const verbose_link_exe = b.option(bool, "verbose-link", "Print link command for self hosted compiler") ?? false;
exe.setVerboseLink(verbose_link_exe);
b.installArtifact(exe);
installStdLib(b, std_files);
installCHeaders(b, c_header_files);
pub fn build(b: &Builder) {
const test_filter = b.option([]const u8, "test-filter", "Skip tests that do not match filter");
const with_lldb = b.option(bool, "with-lldb", "Run tests in LLDB to get a backtrace if one fails") ?? false;
const test_step = b.step("test", "Run all the tests");
test_step.dependOn(docs_step);
test_step.dependOn(tests.addPkgTests(b, test_filter,
"test/behavior.zig", "behavior", "Run the behavior tests",
@@ -22,6 +114,120 @@ pub fn build(b: &Builder) {
test_step.dependOn(tests.addBuildExampleTests(b, test_filter));
test_step.dependOn(tests.addCompileErrorTests(b, test_filter));
test_step.dependOn(tests.addAssembleAndLinkTests(b, test_filter));
test_step.dependOn(tests.addDebugSafetyTests(b, test_filter));
test_step.dependOn(tests.addParseCTests(b, test_filter));
test_step.dependOn(tests.addRuntimeSafetyTests(b, test_filter));
test_step.dependOn(tests.addTranslateCTests(b, test_filter));
test_step.dependOn(tests.addGenHTests(b, test_filter));
}
fn dependOnLib(lib_exe_obj: &std.build.LibExeObjStep, dep: &const LibraryDep) void {
for (dep.libdirs.toSliceConst()) |lib_dir| {
lib_exe_obj.addLibPath(lib_dir);
}
for (dep.system_libs.toSliceConst()) |lib| {
lib_exe_obj.linkSystemLibrary(lib);
}
for (dep.libs.toSliceConst()) |lib| {
lib_exe_obj.addObjectFile(lib);
}
for (dep.includes.toSliceConst()) |include_path| {
lib_exe_obj.addIncludeDir(include_path);
}
}
fn addCppLib(b: &Builder, lib_exe_obj: &std.build.LibExeObjStep, cmake_binary_dir: []const u8, lib_name: []const u8) void {
const lib_prefix = if (lib_exe_obj.target.isWindows()) "" else "lib";
lib_exe_obj.addObjectFile(os.path.join(b.allocator, cmake_binary_dir, "zig_cpp",
b.fmt("{}{}{}", lib_prefix, lib_name, lib_exe_obj.target.libFileExt())) catch unreachable);
}
const LibraryDep = struct {
libdirs: ArrayList([]const u8),
libs: ArrayList([]const u8),
system_libs: ArrayList([]const u8),
includes: ArrayList([]const u8),
};
fn findLLVM(b: &Builder, llvm_config_exe: []const u8) !LibraryDep {
const libs_output = try b.exec([][]const u8{llvm_config_exe, "--libs", "--system-libs"});
const includes_output = try b.exec([][]const u8{llvm_config_exe, "--includedir"});
const libdir_output = try b.exec([][]const u8{llvm_config_exe, "--libdir"});
var result = LibraryDep {
.libs = ArrayList([]const u8).init(b.allocator),
.system_libs = ArrayList([]const u8).init(b.allocator),
.includes = ArrayList([]const u8).init(b.allocator),
.libdirs = ArrayList([]const u8).init(b.allocator),
};
{
var it = mem.split(libs_output, " \r\n");
while (it.next()) |lib_arg| {
if (mem.startsWith(u8, lib_arg, "-l")) {
try result.system_libs.append(lib_arg[2..]);
} else {
if (os.path.isAbsolute(lib_arg)) {
try result.libs.append(lib_arg);
} else {
try result.system_libs.append(lib_arg);
}
}
}
}
{
var it = mem.split(includes_output, " \r\n");
while (it.next()) |include_arg| {
if (mem.startsWith(u8, include_arg, "-I")) {
try result.includes.append(include_arg[2..]);
} else {
try result.includes.append(include_arg);
}
}
}
{
var it = mem.split(libdir_output, " \r\n");
while (it.next()) |libdir| {
if (mem.startsWith(u8, libdir, "-L")) {
try result.libdirs.append(libdir[2..]);
} else {
try result.libdirs.append(libdir);
}
}
}
return result;
}
pub fn installStdLib(b: &Builder, stdlib_files: []const u8) void {
var it = mem.split(stdlib_files, ";");
while (it.next()) |stdlib_file| {
const src_path = os.path.join(b.allocator, "std", stdlib_file) catch unreachable;
const dest_path = os.path.join(b.allocator, "lib", "zig", "std", stdlib_file) catch unreachable;
b.installFile(src_path, dest_path);
}
}
pub fn installCHeaders(b: &Builder, c_header_files: []const u8) void {
var it = mem.split(c_header_files, ";");
while (it.next()) |c_header_file| {
const src_path = os.path.join(b.allocator, "c_headers", c_header_file) catch unreachable;
const dest_path = os.path.join(b.allocator, "lib", "zig", "include", c_header_file) catch unreachable;
b.installFile(src_path, dest_path);
}
}
fn nextValue(index: &usize, build_info: []const u8) []const u8 {
const start = *index;
while (true) : (*index += 1) {
switch (build_info[*index]) {
'\n' => {
const result = build_info[start..*index];
*index += 1;
return result;
},
'\r' => {
const result = build_info[start..*index];
*index += 2;
return result;
},
else => continue,
}
}
}

View File

@@ -131,15 +131,6 @@ __DEVICE__ float ldexp(float __arg, int __exp) {
__DEVICE__ float log(float __x) { return ::logf(__x); }
__DEVICE__ float log10(float __x) { return ::log10f(__x); }
__DEVICE__ float modf(float __x, float *__iptr) { return ::modff(__x, __iptr); }
__DEVICE__ float nexttoward(float __from, double __to) {
return __builtin_nexttowardf(__from, __to);
}
__DEVICE__ double nexttoward(double __from, double __to) {
return __builtin_nexttoward(__from, __to);
}
__DEVICE__ float nexttowardf(float __from, double __to) {
return __builtin_nexttowardf(__from, __to);
}
__DEVICE__ float pow(float __base, float __exp) {
return ::powf(__base, __exp);
}
@@ -157,6 +148,10 @@ __DEVICE__ float sqrt(float __x) { return ::sqrtf(__x); }
__DEVICE__ float tan(float __x) { return ::tanf(__x); }
__DEVICE__ float tanh(float __x) { return ::tanhf(__x); }
// Notably missing above is nexttoward. We omit it because
// libdevice doesn't provide an implementation, and we don't want to be in the
// business of implementing tricky libm functions in this header.
// Now we've defined everything we promised we'd define in
// __clang_cuda_math_forward_declares.h. We need to do two additional things to
// fix up our math functions.
@@ -295,13 +290,6 @@ ldexp(__T __x, int __exp) {
return std::ldexp((double)__x, __exp);
}
template <typename __T>
__DEVICE__ typename __clang_cuda_enable_if<std::numeric_limits<__T>::is_integer,
double>::type
nexttoward(__T __from, double __to) {
return std::nexttoward((double)__from, __to);
}
template <typename __T1, typename __T2>
__DEVICE__ typename __clang_cuda_enable_if<
std::numeric_limits<__T1>::is_specialized &&
@@ -388,7 +376,6 @@ using ::lrint;
using ::lround;
using ::nearbyint;
using ::nextafter;
using ::nexttoward;
using ::pow;
using ::remainder;
using ::remquo;
@@ -456,8 +443,6 @@ using ::lroundf;
using ::modff;
using ::nearbyintf;
using ::nextafterf;
using ::nexttowardf;
using ::nexttowardf;
using ::powf;
using ::remainderf;
using ::remquof;

View File

@@ -34,23 +34,24 @@
#if !defined(__CUDA_ARCH__) || __CUDA_ARCH__ >= 300
#pragma push_macro("__MAKE_SHUFFLES")
#define __MAKE_SHUFFLES(__FnName, __IntIntrinsic, __FloatIntrinsic, __Mask) \
inline __device__ int __FnName(int __val, int __offset, \
#define __MAKE_SHUFFLES(__FnName, __IntIntrinsic, __FloatIntrinsic, __Mask, \
__Type) \
inline __device__ int __FnName(int __val, __Type __offset, \
int __width = warpSize) { \
return __IntIntrinsic(__val, __offset, \
((warpSize - __width) << 8) | (__Mask)); \
} \
inline __device__ float __FnName(float __val, int __offset, \
inline __device__ float __FnName(float __val, __Type __offset, \
int __width = warpSize) { \
return __FloatIntrinsic(__val, __offset, \
((warpSize - __width) << 8) | (__Mask)); \
} \
inline __device__ unsigned int __FnName(unsigned int __val, int __offset, \
inline __device__ unsigned int __FnName(unsigned int __val, __Type __offset, \
int __width = warpSize) { \
return static_cast<unsigned int>( \
::__FnName(static_cast<int>(__val), __offset, __width)); \
} \
inline __device__ long long __FnName(long long __val, int __offset, \
inline __device__ long long __FnName(long long __val, __Type __offset, \
int __width = warpSize) { \
struct __Bits { \
int __a, __b; \
@@ -65,12 +66,29 @@
memcpy(&__ret, &__tmp, sizeof(__tmp)); \
return __ret; \
} \
inline __device__ long __FnName(long __val, __Type __offset, \
int __width = warpSize) { \
_Static_assert(sizeof(long) == sizeof(long long) || \
sizeof(long) == sizeof(int)); \
if (sizeof(long) == sizeof(long long)) { \
return static_cast<long>( \
::__FnName(static_cast<long long>(__val), __offset, __width)); \
} else if (sizeof(long) == sizeof(int)) { \
return static_cast<long>( \
::__FnName(static_cast<int>(__val), __offset, __width)); \
} \
} \
inline __device__ unsigned long __FnName( \
unsigned long __val, __Type __offset, int __width = warpSize) { \
return static_cast<unsigned long>( \
::__FnName(static_cast<long>(__val), __offset, __width)); \
} \
inline __device__ unsigned long long __FnName( \
unsigned long long __val, int __offset, int __width = warpSize) { \
unsigned long long __val, __Type __offset, int __width = warpSize) { \
return static_cast<unsigned long long>(::__FnName( \
static_cast<unsigned long long>(__val), __offset, __width)); \
} \
inline __device__ double __FnName(double __val, int __offset, \
inline __device__ double __FnName(double __val, __Type __offset, \
int __width = warpSize) { \
long long __tmp; \
_Static_assert(sizeof(__tmp) == sizeof(__val)); \
@@ -81,17 +99,166 @@
return __ret; \
}
__MAKE_SHUFFLES(__shfl, __nvvm_shfl_idx_i32, __nvvm_shfl_idx_f32, 0x1f);
__MAKE_SHUFFLES(__shfl, __nvvm_shfl_idx_i32, __nvvm_shfl_idx_f32, 0x1f, int);
// We use 0 rather than 31 as our mask, because shfl.up applies to lanes >=
// maxLane.
__MAKE_SHUFFLES(__shfl_up, __nvvm_shfl_up_i32, __nvvm_shfl_up_f32, 0);
__MAKE_SHUFFLES(__shfl_down, __nvvm_shfl_down_i32, __nvvm_shfl_down_f32, 0x1f);
__MAKE_SHUFFLES(__shfl_xor, __nvvm_shfl_bfly_i32, __nvvm_shfl_bfly_f32, 0x1f);
__MAKE_SHUFFLES(__shfl_up, __nvvm_shfl_up_i32, __nvvm_shfl_up_f32, 0,
unsigned int);
__MAKE_SHUFFLES(__shfl_down, __nvvm_shfl_down_i32, __nvvm_shfl_down_f32, 0x1f,
unsigned int);
__MAKE_SHUFFLES(__shfl_xor, __nvvm_shfl_bfly_i32, __nvvm_shfl_bfly_f32, 0x1f,
int);
#pragma pop_macro("__MAKE_SHUFFLES")
#endif // !defined(__CUDA_ARCH__) || __CUDA_ARCH__ >= 300
#if CUDA_VERSION >= 9000
#if (!defined(__CUDA_ARCH__) || __CUDA_ARCH__ >= 300)
// __shfl_sync_* variants available in CUDA-9
#pragma push_macro("__MAKE_SYNC_SHUFFLES")
#define __MAKE_SYNC_SHUFFLES(__FnName, __IntIntrinsic, __FloatIntrinsic, \
__Mask, __Type) \
inline __device__ int __FnName(unsigned int __mask, int __val, \
__Type __offset, int __width = warpSize) { \
return __IntIntrinsic(__mask, __val, __offset, \
((warpSize - __width) << 8) | (__Mask)); \
} \
inline __device__ float __FnName(unsigned int __mask, float __val, \
__Type __offset, int __width = warpSize) { \
return __FloatIntrinsic(__mask, __val, __offset, \
((warpSize - __width) << 8) | (__Mask)); \
} \
inline __device__ unsigned int __FnName(unsigned int __mask, \
unsigned int __val, __Type __offset, \
int __width = warpSize) { \
return static_cast<unsigned int>( \
::__FnName(__mask, static_cast<int>(__val), __offset, __width)); \
} \
inline __device__ long long __FnName(unsigned int __mask, long long __val, \
__Type __offset, \
int __width = warpSize) { \
struct __Bits { \
int __a, __b; \
}; \
_Static_assert(sizeof(__val) == sizeof(__Bits)); \
_Static_assert(sizeof(__Bits) == 2 * sizeof(int)); \
__Bits __tmp; \
memcpy(&__val, &__tmp, sizeof(__val)); \
__tmp.__a = ::__FnName(__mask, __tmp.__a, __offset, __width); \
__tmp.__b = ::__FnName(__mask, __tmp.__b, __offset, __width); \
long long __ret; \
memcpy(&__ret, &__tmp, sizeof(__tmp)); \
return __ret; \
} \
inline __device__ unsigned long long __FnName( \
unsigned int __mask, unsigned long long __val, __Type __offset, \
int __width = warpSize) { \
return static_cast<unsigned long long>(::__FnName( \
__mask, static_cast<unsigned long long>(__val), __offset, __width)); \
} \
inline __device__ long __FnName(unsigned int __mask, long __val, \
__Type __offset, int __width = warpSize) { \
_Static_assert(sizeof(long) == sizeof(long long) || \
sizeof(long) == sizeof(int)); \
if (sizeof(long) == sizeof(long long)) { \
return static_cast<long>(::__FnName( \
__mask, static_cast<long long>(__val), __offset, __width)); \
} else if (sizeof(long) == sizeof(int)) { \
return static_cast<long>( \
::__FnName(__mask, static_cast<int>(__val), __offset, __width)); \
} \
} \
inline __device__ unsigned long __FnName( \
unsigned int __mask, unsigned long __val, __Type __offset, \
int __width = warpSize) { \
return static_cast<unsigned long>( \
::__FnName(__mask, static_cast<long>(__val), __offset, __width)); \
} \
inline __device__ double __FnName(unsigned int __mask, double __val, \
__Type __offset, int __width = warpSize) { \
long long __tmp; \
_Static_assert(sizeof(__tmp) == sizeof(__val)); \
memcpy(&__tmp, &__val, sizeof(__val)); \
__tmp = ::__FnName(__mask, __tmp, __offset, __width); \
double __ret; \
memcpy(&__ret, &__tmp, sizeof(__ret)); \
return __ret; \
}
__MAKE_SYNC_SHUFFLES(__shfl_sync, __nvvm_shfl_sync_idx_i32,
__nvvm_shfl_sync_idx_f32, 0x1f, int);
// We use 0 rather than 31 as our mask, because shfl.up applies to lanes >=
// maxLane.
__MAKE_SYNC_SHUFFLES(__shfl_up_sync, __nvvm_shfl_sync_up_i32,
__nvvm_shfl_sync_up_f32, 0, unsigned int);
__MAKE_SYNC_SHUFFLES(__shfl_down_sync, __nvvm_shfl_sync_down_i32,
__nvvm_shfl_sync_down_f32, 0x1f, unsigned int);
__MAKE_SYNC_SHUFFLES(__shfl_xor_sync, __nvvm_shfl_sync_bfly_i32,
__nvvm_shfl_sync_bfly_f32, 0x1f, int);
#pragma pop_macro("__MAKE_SYNC_SHUFFLES")
inline __device__ void __syncwarp(unsigned int mask = 0xffffffff) {
return __nvvm_bar_warp_sync(mask);
}
inline __device__ void __barrier_sync(unsigned int id) {
__nvvm_barrier_sync(id);
}
inline __device__ void __barrier_sync_count(unsigned int id,
unsigned int count) {
__nvvm_barrier_sync_cnt(id, count);
}
inline __device__ int __all_sync(unsigned int mask, int pred) {
return __nvvm_vote_all_sync(mask, pred);
}
inline __device__ int __any_sync(unsigned int mask, int pred) {
return __nvvm_vote_any_sync(mask, pred);
}
inline __device__ int __uni_sync(unsigned int mask, int pred) {
return __nvvm_vote_uni_sync(mask, pred);
}
inline __device__ unsigned int __ballot_sync(unsigned int mask, int pred) {
return __nvvm_vote_ballot_sync(mask, pred);
}
inline __device__ unsigned int __activemask() { return __nvvm_vote_ballot(1); }
inline __device__ unsigned int __fns(unsigned mask, unsigned base, int offset) {
return __nvvm_fns(mask, base, offset);
}
#endif // !defined(__CUDA_ARCH__) || __CUDA_ARCH__ >= 300
// Define __match* builtins CUDA-9 headers expect to see.
#if !defined(__CUDA_ARCH__) || __CUDA_ARCH__ >= 700
inline __device__ unsigned int __match32_any_sync(unsigned int mask,
unsigned int value) {
return __nvvm_match_any_sync_i32(mask, value);
}
inline __device__ unsigned long long
__match64_any_sync(unsigned int mask, unsigned long long value) {
return __nvvm_match_any_sync_i64(mask, value);
}
inline __device__ unsigned int
__match32_all_sync(unsigned int mask, unsigned int value, int *pred) {
return __nvvm_match_all_sync_i32p(mask, value, pred);
}
inline __device__ unsigned long long
__match64_all_sync(unsigned int mask, unsigned long long value, int *pred) {
return __nvvm_match_all_sync_i64p(mask, value, pred);
}
#include "crt/sm_70_rt.hpp"
#endif // !defined(__CUDA_ARCH__) || __CUDA_ARCH__ >= 700
#endif // __CUDA_VERSION >= 9000
// sm_32 intrinsics: __ldg and __funnelshift_{l,lc,r,rc}.
// Prevent the vanilla sm_32 intrinsics header from being included.

View File

@@ -149,9 +149,6 @@ __DEVICE__ double nearbyint(double);
__DEVICE__ float nearbyint(float);
__DEVICE__ double nextafter(double, double);
__DEVICE__ float nextafter(float, float);
__DEVICE__ double nexttoward(double, double);
__DEVICE__ float nexttoward(float, double);
__DEVICE__ float nexttowardf(float, double);
__DEVICE__ double pow(double, double);
__DEVICE__ double pow(double, int);
__DEVICE__ float pow(float, float);
@@ -185,6 +182,10 @@ __DEVICE__ float tgamma(float);
__DEVICE__ double trunc(double);
__DEVICE__ float trunc(float);
// Notably missing above is nexttoward, which we don't define on
// the device side because libdevice doesn't give us an implementation, and we
// don't want to be in the business of writing one ourselves.
// We need to define these overloads in exactly the namespace our standard
// library uses (including the right inline namespace), otherwise they won't be
// picked up by other functions in the standard library (e.g. functions in
@@ -255,7 +256,6 @@ using ::nan;
using ::nanf;
using ::nearbyint;
using ::nextafter;
using ::nexttoward;
using ::pow;
using ::remainder;
using ::remquo;

View File

@@ -62,7 +62,7 @@
#include "cuda.h"
#if !defined(CUDA_VERSION)
#error "cuda.h did not define CUDA_VERSION"
#elif CUDA_VERSION < 7000 || CUDA_VERSION > 8000
#elif CUDA_VERSION < 7000 || CUDA_VERSION > 9000
#error "Unsupported CUDA version!"
#endif
@@ -86,7 +86,11 @@
#define __COMMON_FUNCTIONS_H__
#undef __CUDACC__
#if CUDA_VERSION < 9000
#define __CUDABE__
#else
#define __CUDA_LIBDEVICE__
#endif
// Disables definitions of device-side runtime support stubs in
// cuda_device_runtime_api.h
#include "driver_types.h"
@@ -94,6 +98,7 @@
#include "host_defines.h"
#undef __CUDABE__
#undef __CUDA_LIBDEVICE__
#define __CUDACC__
#include "cuda_runtime.h"
@@ -105,7 +110,9 @@
#define __nvvm_memcpy(s, d, n, a) __builtin_memcpy(s, d, n)
#define __nvvm_memset(d, c, n, a) __builtin_memset(d, c, n)
#if CUDA_VERSION < 9000
#include "crt/device_runtime.h"
#endif
#include "crt/host_runtime.h"
// device_runtime.h defines __cxa_* macros that will conflict with
// cxxabi.h.
@@ -166,7 +173,18 @@ inline __host__ double __signbitd(double x) {
// __device__.
#pragma push_macro("__forceinline__")
#define __forceinline__ __device__ __inline__ __attribute__((always_inline))
#pragma push_macro("__float2half_rn")
#if CUDA_VERSION >= 9000
// CUDA-9 has conflicting prototypes for __float2half_rn(float f) in
// cuda_fp16.h[pp] and device_functions.hpp. We need to get the one in
// device_functions.hpp out of the way.
#define __float2half_rn __float2half_rn_disabled
#endif
#include "device_functions.hpp"
#pragma pop_macro("__float2half_rn")
// math_function.hpp uses the __USE_FAST_MATH__ macro to determine whether we
// get the slow-but-accurate or fast-but-inaccurate versions of functions like
@@ -247,7 +265,23 @@ static inline __device__ void __brkpt(int __c) { __brkpt(); }
#pragma push_macro("__GNUC__")
#undef __GNUC__
#define signbit __ignored_cuda_signbit
// CUDA-9 omits device-side definitions of some math functions if it sees
// include guard from math.h wrapper from libstdc++. We have to undo the header
// guard temporarily to get the definitions we need.
#pragma push_macro("_GLIBCXX_MATH_H")
#pragma push_macro("_LIBCPP_VERSION")
#if CUDA_VERSION >= 9000
#undef _GLIBCXX_MATH_H
// We also need to undo another guard that checks for libc++ 3.8+
#ifdef _LIBCPP_VERSION
#define _LIBCPP_VERSION 3700
#endif
#endif
#include "math_functions.hpp"
#pragma pop_macro("_GLIBCXX_MATH_H")
#pragma pop_macro("_LIBCPP_VERSION")
#pragma pop_macro("__GNUC__")
#pragma pop_macro("signbit")

49
c_headers/arm64intr.h Normal file
View File

@@ -0,0 +1,49 @@
/*===---- arm64intr.h - ARM64 Windows intrinsics -------------------------------===
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
*
*===-----------------------------------------------------------------------===
*/
/* Only include this if we're compiling for the windows platform. */
#ifndef _MSC_VER
#include_next <arm64intr.h>
#else
#ifndef __ARM64INTR_H
#define __ARM64INTR_H
typedef enum
{
_ARM64_BARRIER_SY = 0xF,
_ARM64_BARRIER_ST = 0xE,
_ARM64_BARRIER_LD = 0xD,
_ARM64_BARRIER_ISH = 0xB,
_ARM64_BARRIER_ISHST = 0xA,
_ARM64_BARRIER_ISHLD = 0x9,
_ARM64_BARRIER_NSH = 0x7,
_ARM64_BARRIER_NSHST = 0x6,
_ARM64_BARRIER_NSHLD = 0x5,
_ARM64_BARRIER_OSH = 0x3,
_ARM64_BARRIER_OSHST = 0x2,
_ARM64_BARRIER_OSHLD = 0x1
} _ARM64INTR_BARRIER_TYPE;
#endif /* __ARM64INTR_H */
#endif /* _MSC_VER */

File diff suppressed because it is too large Load Diff

View File

@@ -145,13 +145,21 @@ _mm256_andnot_si256(__m256i __a, __m256i __b)
static __inline__ __m256i __DEFAULT_FN_ATTRS
_mm256_avg_epu8(__m256i __a, __m256i __b)
{
return (__m256i)__builtin_ia32_pavgb256((__v32qi)__a, (__v32qi)__b);
typedef unsigned short __v32hu __attribute__((__vector_size__(64)));
return (__m256i)__builtin_convertvector(
((__builtin_convertvector((__v32qu)__a, __v32hu) +
__builtin_convertvector((__v32qu)__b, __v32hu)) + 1)
>> 1, __v32qu);
}
static __inline__ __m256i __DEFAULT_FN_ATTRS
_mm256_avg_epu16(__m256i __a, __m256i __b)
{
return (__m256i)__builtin_ia32_pavgw256((__v16hi)__a, (__v16hi)__b);
typedef unsigned int __v16su __attribute__((__vector_size__(64)));
return (__m256i)__builtin_convertvector(
((__builtin_convertvector((__v16hu)__a, __v16su) +
__builtin_convertvector((__v16hu)__b, __v16su)) + 1)
>> 1, __v16hu);
}
static __inline__ __m256i __DEFAULT_FN_ATTRS

View File

@@ -0,0 +1,97 @@
/*===------------- avx512bitalgintrin.h - BITALG intrinsics ------------------===
*
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
*
*===-----------------------------------------------------------------------===
*/
#ifndef __IMMINTRIN_H
#error "Never use <avx512bitalgintrin.h> directly; include <immintrin.h> instead."
#endif
#ifndef __AVX512BITALGINTRIN_H
#define __AVX512BITALGINTRIN_H
/* Define the default attributes for the functions in this file. */
#define __DEFAULT_FN_ATTRS __attribute__((__always_inline__, __nodebug__, __target__("avx512bitalg")))
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_popcnt_epi16(__m512i __A)
{
return (__m512i) __builtin_ia32_vpopcntw_512((__v32hi) __A);
}
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_mask_popcnt_epi16(__m512i __A, __mmask32 __U, __m512i __B)
{
return (__m512i) __builtin_ia32_selectw_512((__mmask32) __U,
(__v32hi) _mm512_popcnt_epi16(__B),
(__v32hi) __A);
}
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_maskz_popcnt_epi16(__mmask32 __U, __m512i __B)
{
return _mm512_mask_popcnt_epi16((__m512i) _mm512_setzero_hi(),
__U,
__B);
}
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_popcnt_epi8(__m512i __A)
{
return (__m512i) __builtin_ia32_vpopcntb_512((__v64qi) __A);
}
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_mask_popcnt_epi8(__m512i __A, __mmask64 __U, __m512i __B)
{
return (__m512i) __builtin_ia32_selectb_512((__mmask64) __U,
(__v64qi) _mm512_popcnt_epi8(__B),
(__v64qi) __A);
}
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_maskz_popcnt_epi8(__mmask64 __U, __m512i __B)
{
return _mm512_mask_popcnt_epi8((__m512i) _mm512_setzero_qi(),
__U,
__B);
}
static __inline__ __mmask64 __DEFAULT_FN_ATTRS
_mm512_mask_bitshuffle_epi64_mask(__mmask64 __U, __m512i __A, __m512i __B)
{
return (__mmask64) __builtin_ia32_vpshufbitqmb512_mask((__v64qi) __A,
(__v64qi) __B,
__U);
}
static __inline__ __mmask64 __DEFAULT_FN_ATTRS
_mm512_bitshuffle_epi64_mask(__m512i __A, __m512i __B)
{
return _mm512_mask_bitshuffle_epi64_mask((__mmask64) -1,
__A,
__B);
}
#undef __DEFAULT_FN_ATTRS
#endif

View File

@@ -56,293 +56,145 @@ _mm512_setzero_hi(void) {
/* Integer compare */
static __inline__ __mmask64 __DEFAULT_FN_ATTRS
_mm512_cmpeq_epi8_mask(__m512i __a, __m512i __b) {
return (__mmask64)__builtin_ia32_pcmpeqb512_mask((__v64qi)__a, (__v64qi)__b,
(__mmask64)-1);
}
#define _mm512_cmp_epi8_mask(a, b, p) __extension__ ({ \
(__mmask64)__builtin_ia32_cmpb512_mask((__v64qi)(__m512i)(a), \
(__v64qi)(__m512i)(b), (int)(p), \
(__mmask64)-1); })
static __inline__ __mmask64 __DEFAULT_FN_ATTRS
_mm512_mask_cmpeq_epi8_mask(__mmask64 __u, __m512i __a, __m512i __b) {
return (__mmask64)__builtin_ia32_pcmpeqb512_mask((__v64qi)__a, (__v64qi)__b,
__u);
}
#define _mm512_mask_cmp_epi8_mask(m, a, b, p) __extension__ ({ \
(__mmask64)__builtin_ia32_cmpb512_mask((__v64qi)(__m512i)(a), \
(__v64qi)(__m512i)(b), (int)(p), \
(__mmask64)(m)); })
static __inline__ __mmask64 __DEFAULT_FN_ATTRS
_mm512_cmpeq_epu8_mask(__m512i __a, __m512i __b) {
return (__mmask64)__builtin_ia32_ucmpb512_mask((__v64qi)__a, (__v64qi)__b, 0,
(__mmask64)-1);
}
#define _mm512_cmp_epu8_mask(a, b, p) __extension__ ({ \
(__mmask64)__builtin_ia32_ucmpb512_mask((__v64qi)(__m512i)(a), \
(__v64qi)(__m512i)(b), (int)(p), \
(__mmask64)-1); })
static __inline__ __mmask64 __DEFAULT_FN_ATTRS
_mm512_mask_cmpeq_epu8_mask(__mmask64 __u, __m512i __a, __m512i __b) {
return (__mmask64)__builtin_ia32_ucmpb512_mask((__v64qi)__a, (__v64qi)__b, 0,
__u);
}
#define _mm512_mask_cmp_epu8_mask(m, a, b, p) __extension__ ({ \
(__mmask64)__builtin_ia32_ucmpb512_mask((__v64qi)(__m512i)(a), \
(__v64qi)(__m512i)(b), (int)(p), \
(__mmask64)(m)); })
static __inline__ __mmask32 __DEFAULT_FN_ATTRS
_mm512_cmpeq_epi16_mask(__m512i __a, __m512i __b) {
return (__mmask32)__builtin_ia32_pcmpeqw512_mask((__v32hi)__a, (__v32hi)__b,
(__mmask32)-1);
}
#define _mm512_cmp_epi16_mask(a, b, p) __extension__ ({ \
(__mmask32)__builtin_ia32_cmpw512_mask((__v32hi)(__m512i)(a), \
(__v32hi)(__m512i)(b), (int)(p), \
(__mmask32)-1); })
static __inline__ __mmask32 __DEFAULT_FN_ATTRS
_mm512_mask_cmpeq_epi16_mask(__mmask32 __u, __m512i __a, __m512i __b) {
return (__mmask32)__builtin_ia32_pcmpeqw512_mask((__v32hi)__a, (__v32hi)__b,
__u);
}
#define _mm512_mask_cmp_epi16_mask(m, a, b, p) __extension__ ({ \
(__mmask32)__builtin_ia32_cmpw512_mask((__v32hi)(__m512i)(a), \
(__v32hi)(__m512i)(b), (int)(p), \
(__mmask32)(m)); })
static __inline__ __mmask32 __DEFAULT_FN_ATTRS
_mm512_cmpeq_epu16_mask(__m512i __a, __m512i __b) {
return (__mmask32)__builtin_ia32_ucmpw512_mask((__v32hi)__a, (__v32hi)__b, 0,
(__mmask32)-1);
}
#define _mm512_cmp_epu16_mask(a, b, p) __extension__ ({ \
(__mmask32)__builtin_ia32_ucmpw512_mask((__v32hi)(__m512i)(a), \
(__v32hi)(__m512i)(b), (int)(p), \
(__mmask32)-1); })
static __inline__ __mmask32 __DEFAULT_FN_ATTRS
_mm512_mask_cmpeq_epu16_mask(__mmask32 __u, __m512i __a, __m512i __b) {
return (__mmask32)__builtin_ia32_ucmpw512_mask((__v32hi)__a, (__v32hi)__b, 0,
__u);
}
#define _mm512_mask_cmp_epu16_mask(m, a, b, p) __extension__ ({ \
(__mmask32)__builtin_ia32_ucmpw512_mask((__v32hi)(__m512i)(a), \
(__v32hi)(__m512i)(b), (int)(p), \
(__mmask32)(m)); })
static __inline__ __mmask64 __DEFAULT_FN_ATTRS
_mm512_cmpge_epi8_mask(__m512i __a, __m512i __b) {
return (__mmask64)__builtin_ia32_cmpb512_mask((__v64qi)__a, (__v64qi)__b, 5,
(__mmask64)-1);
}
#define _mm512_cmpeq_epi8_mask(A, B) \
_mm512_cmp_epi8_mask((A), (B), _MM_CMPINT_EQ)
#define _mm512_mask_cmpeq_epi8_mask(k, A, B) \
_mm512_mask_cmp_epi8_mask((k), (A), (B), _MM_CMPINT_EQ)
#define _mm512_cmpge_epi8_mask(A, B) \
_mm512_cmp_epi8_mask((A), (B), _MM_CMPINT_GE)
#define _mm512_mask_cmpge_epi8_mask(k, A, B) \
_mm512_mask_cmp_epi8_mask((k), (A), (B), _MM_CMPINT_GE)
#define _mm512_cmpgt_epi8_mask(A, B) \
_mm512_cmp_epi8_mask((A), (B), _MM_CMPINT_GT)
#define _mm512_mask_cmpgt_epi8_mask(k, A, B) \
_mm512_mask_cmp_epi8_mask((k), (A), (B), _MM_CMPINT_GT)
#define _mm512_cmple_epi8_mask(A, B) \
_mm512_cmp_epi8_mask((A), (B), _MM_CMPINT_LE)
#define _mm512_mask_cmple_epi8_mask(k, A, B) \
_mm512_mask_cmp_epi8_mask((k), (A), (B), _MM_CMPINT_LE)
#define _mm512_cmplt_epi8_mask(A, B) \
_mm512_cmp_epi8_mask((A), (B), _MM_CMPINT_LT)
#define _mm512_mask_cmplt_epi8_mask(k, A, B) \
_mm512_mask_cmp_epi8_mask((k), (A), (B), _MM_CMPINT_LT)
#define _mm512_cmpneq_epi8_mask(A, B) \
_mm512_cmp_epi8_mask((A), (B), _MM_CMPINT_NE)
#define _mm512_mask_cmpneq_epi8_mask(k, A, B) \
_mm512_mask_cmp_epi8_mask((k), (A), (B), _MM_CMPINT_NE)
static __inline__ __mmask64 __DEFAULT_FN_ATTRS
_mm512_mask_cmpge_epi8_mask(__mmask64 __u, __m512i __a, __m512i __b) {
return (__mmask64)__builtin_ia32_cmpb512_mask((__v64qi)__a, (__v64qi)__b, 5,
__u);
}
#define _mm512_cmpeq_epu8_mask(A, B) \
_mm512_cmp_epu8_mask((A), (B), _MM_CMPINT_EQ)
#define _mm512_mask_cmpeq_epu8_mask(k, A, B) \
_mm512_mask_cmp_epu8_mask((k), (A), (B), _MM_CMPINT_EQ)
#define _mm512_cmpge_epu8_mask(A, B) \
_mm512_cmp_epu8_mask((A), (B), _MM_CMPINT_GE)
#define _mm512_mask_cmpge_epu8_mask(k, A, B) \
_mm512_mask_cmp_epu8_mask((k), (A), (B), _MM_CMPINT_GE)
#define _mm512_cmpgt_epu8_mask(A, B) \
_mm512_cmp_epu8_mask((A), (B), _MM_CMPINT_GT)
#define _mm512_mask_cmpgt_epu8_mask(k, A, B) \
_mm512_mask_cmp_epu8_mask((k), (A), (B), _MM_CMPINT_GT)
#define _mm512_cmple_epu8_mask(A, B) \
_mm512_cmp_epu8_mask((A), (B), _MM_CMPINT_LE)
#define _mm512_mask_cmple_epu8_mask(k, A, B) \
_mm512_mask_cmp_epu8_mask((k), (A), (B), _MM_CMPINT_LE)
#define _mm512_cmplt_epu8_mask(A, B) \
_mm512_cmp_epu8_mask((A), (B), _MM_CMPINT_LT)
#define _mm512_mask_cmplt_epu8_mask(k, A, B) \
_mm512_mask_cmp_epu8_mask((k), (A), (B), _MM_CMPINT_LT)
#define _mm512_cmpneq_epu8_mask(A, B) \
_mm512_cmp_epu8_mask((A), (B), _MM_CMPINT_NE)
#define _mm512_mask_cmpneq_epu8_mask(k, A, B) \
_mm512_mask_cmp_epu8_mask((k), (A), (B), _MM_CMPINT_NE)
static __inline__ __mmask64 __DEFAULT_FN_ATTRS
_mm512_cmpge_epu8_mask(__m512i __a, __m512i __b) {
return (__mmask64)__builtin_ia32_ucmpb512_mask((__v64qi)__a, (__v64qi)__b, 5,
(__mmask64)-1);
}
#define _mm512_cmpeq_epi16_mask(A, B) \
_mm512_cmp_epi16_mask((A), (B), _MM_CMPINT_EQ)
#define _mm512_mask_cmpeq_epi16_mask(k, A, B) \
_mm512_mask_cmp_epi16_mask((k), (A), (B), _MM_CMPINT_EQ)
#define _mm512_cmpge_epi16_mask(A, B) \
_mm512_cmp_epi16_mask((A), (B), _MM_CMPINT_GE)
#define _mm512_mask_cmpge_epi16_mask(k, A, B) \
_mm512_mask_cmp_epi16_mask((k), (A), (B), _MM_CMPINT_GE)
#define _mm512_cmpgt_epi16_mask(A, B) \
_mm512_cmp_epi16_mask((A), (B), _MM_CMPINT_GT)
#define _mm512_mask_cmpgt_epi16_mask(k, A, B) \
_mm512_mask_cmp_epi16_mask((k), (A), (B), _MM_CMPINT_GT)
#define _mm512_cmple_epi16_mask(A, B) \
_mm512_cmp_epi16_mask((A), (B), _MM_CMPINT_LE)
#define _mm512_mask_cmple_epi16_mask(k, A, B) \
_mm512_mask_cmp_epi16_mask((k), (A), (B), _MM_CMPINT_LE)
#define _mm512_cmplt_epi16_mask(A, B) \
_mm512_cmp_epi16_mask((A), (B), _MM_CMPINT_LT)
#define _mm512_mask_cmplt_epi16_mask(k, A, B) \
_mm512_mask_cmp_epi16_mask((k), (A), (B), _MM_CMPINT_LT)
#define _mm512_cmpneq_epi16_mask(A, B) \
_mm512_cmp_epi16_mask((A), (B), _MM_CMPINT_NE)
#define _mm512_mask_cmpneq_epi16_mask(k, A, B) \
_mm512_mask_cmp_epi16_mask((k), (A), (B), _MM_CMPINT_NE)
static __inline__ __mmask64 __DEFAULT_FN_ATTRS
_mm512_mask_cmpge_epu8_mask(__mmask64 __u, __m512i __a, __m512i __b) {
return (__mmask64)__builtin_ia32_ucmpb512_mask((__v64qi)__a, (__v64qi)__b, 5,
__u);
}
static __inline__ __mmask32 __DEFAULT_FN_ATTRS
_mm512_cmpge_epi16_mask(__m512i __a, __m512i __b) {
return (__mmask32)__builtin_ia32_cmpw512_mask((__v32hi)__a, (__v32hi)__b, 5,
(__mmask32)-1);
}
static __inline__ __mmask32 __DEFAULT_FN_ATTRS
_mm512_mask_cmpge_epi16_mask(__mmask32 __u, __m512i __a, __m512i __b) {
return (__mmask32)__builtin_ia32_cmpw512_mask((__v32hi)__a, (__v32hi)__b, 5,
__u);
}
static __inline__ __mmask32 __DEFAULT_FN_ATTRS
_mm512_cmpge_epu16_mask(__m512i __a, __m512i __b) {
return (__mmask32)__builtin_ia32_ucmpw512_mask((__v32hi)__a, (__v32hi)__b, 5,
(__mmask32)-1);
}
static __inline__ __mmask32 __DEFAULT_FN_ATTRS
_mm512_mask_cmpge_epu16_mask(__mmask32 __u, __m512i __a, __m512i __b) {
return (__mmask32)__builtin_ia32_ucmpw512_mask((__v32hi)__a, (__v32hi)__b, 5,
__u);
}
static __inline__ __mmask64 __DEFAULT_FN_ATTRS
_mm512_cmpgt_epi8_mask(__m512i __a, __m512i __b) {
return (__mmask64)__builtin_ia32_pcmpgtb512_mask((__v64qi)__a, (__v64qi)__b,
(__mmask64)-1);
}
static __inline__ __mmask64 __DEFAULT_FN_ATTRS
_mm512_mask_cmpgt_epi8_mask(__mmask64 __u, __m512i __a, __m512i __b) {
return (__mmask64)__builtin_ia32_pcmpgtb512_mask((__v64qi)__a, (__v64qi)__b,
__u);
}
static __inline__ __mmask64 __DEFAULT_FN_ATTRS
_mm512_cmpgt_epu8_mask(__m512i __a, __m512i __b) {
return (__mmask64)__builtin_ia32_ucmpb512_mask((__v64qi)__a, (__v64qi)__b, 6,
(__mmask64)-1);
}
static __inline__ __mmask64 __DEFAULT_FN_ATTRS
_mm512_mask_cmpgt_epu8_mask(__mmask64 __u, __m512i __a, __m512i __b) {
return (__mmask64)__builtin_ia32_ucmpb512_mask((__v64qi)__a, (__v64qi)__b, 6,
__u);
}
static __inline__ __mmask32 __DEFAULT_FN_ATTRS
_mm512_cmpgt_epi16_mask(__m512i __a, __m512i __b) {
return (__mmask32)__builtin_ia32_pcmpgtw512_mask((__v32hi)__a, (__v32hi)__b,
(__mmask32)-1);
}
static __inline__ __mmask32 __DEFAULT_FN_ATTRS
_mm512_mask_cmpgt_epi16_mask(__mmask32 __u, __m512i __a, __m512i __b) {
return (__mmask32)__builtin_ia32_pcmpgtw512_mask((__v32hi)__a, (__v32hi)__b,
__u);
}
static __inline__ __mmask32 __DEFAULT_FN_ATTRS
_mm512_cmpgt_epu16_mask(__m512i __a, __m512i __b) {
return (__mmask32)__builtin_ia32_ucmpw512_mask((__v32hi)__a, (__v32hi)__b, 6,
(__mmask32)-1);
}
static __inline__ __mmask32 __DEFAULT_FN_ATTRS
_mm512_mask_cmpgt_epu16_mask(__mmask32 __u, __m512i __a, __m512i __b) {
return (__mmask32)__builtin_ia32_ucmpw512_mask((__v32hi)__a, (__v32hi)__b, 6,
__u);
}
static __inline__ __mmask64 __DEFAULT_FN_ATTRS
_mm512_cmple_epi8_mask(__m512i __a, __m512i __b) {
return (__mmask64)__builtin_ia32_cmpb512_mask((__v64qi)__a, (__v64qi)__b, 2,
(__mmask64)-1);
}
static __inline__ __mmask64 __DEFAULT_FN_ATTRS
_mm512_mask_cmple_epi8_mask(__mmask64 __u, __m512i __a, __m512i __b) {
return (__mmask64)__builtin_ia32_cmpb512_mask((__v64qi)__a, (__v64qi)__b, 2,
__u);
}
static __inline__ __mmask64 __DEFAULT_FN_ATTRS
_mm512_cmple_epu8_mask(__m512i __a, __m512i __b) {
return (__mmask64)__builtin_ia32_ucmpb512_mask((__v64qi)__a, (__v64qi)__b, 2,
(__mmask64)-1);
}
static __inline__ __mmask64 __DEFAULT_FN_ATTRS
_mm512_mask_cmple_epu8_mask(__mmask64 __u, __m512i __a, __m512i __b) {
return (__mmask64)__builtin_ia32_ucmpb512_mask((__v64qi)__a, (__v64qi)__b, 2,
__u);
}
static __inline__ __mmask32 __DEFAULT_FN_ATTRS
_mm512_cmple_epi16_mask(__m512i __a, __m512i __b) {
return (__mmask32)__builtin_ia32_cmpw512_mask((__v32hi)__a, (__v32hi)__b, 2,
(__mmask32)-1);
}
static __inline__ __mmask32 __DEFAULT_FN_ATTRS
_mm512_mask_cmple_epi16_mask(__mmask32 __u, __m512i __a, __m512i __b) {
return (__mmask32)__builtin_ia32_cmpw512_mask((__v32hi)__a, (__v32hi)__b, 2,
__u);
}
static __inline__ __mmask32 __DEFAULT_FN_ATTRS
_mm512_cmple_epu16_mask(__m512i __a, __m512i __b) {
return (__mmask32)__builtin_ia32_ucmpw512_mask((__v32hi)__a, (__v32hi)__b, 2,
(__mmask32)-1);
}
static __inline__ __mmask32 __DEFAULT_FN_ATTRS
_mm512_mask_cmple_epu16_mask(__mmask32 __u, __m512i __a, __m512i __b) {
return (__mmask32)__builtin_ia32_ucmpw512_mask((__v32hi)__a, (__v32hi)__b, 2,
__u);
}
static __inline__ __mmask64 __DEFAULT_FN_ATTRS
_mm512_cmplt_epi8_mask(__m512i __a, __m512i __b) {
return (__mmask64)__builtin_ia32_cmpb512_mask((__v64qi)__a, (__v64qi)__b, 1,
(__mmask64)-1);
}
static __inline__ __mmask64 __DEFAULT_FN_ATTRS
_mm512_mask_cmplt_epi8_mask(__mmask64 __u, __m512i __a, __m512i __b) {
return (__mmask64)__builtin_ia32_cmpb512_mask((__v64qi)__a, (__v64qi)__b, 1,
__u);
}
static __inline__ __mmask64 __DEFAULT_FN_ATTRS
_mm512_cmplt_epu8_mask(__m512i __a, __m512i __b) {
return (__mmask64)__builtin_ia32_ucmpb512_mask((__v64qi)__a, (__v64qi)__b, 1,
(__mmask64)-1);
}
static __inline__ __mmask64 __DEFAULT_FN_ATTRS
_mm512_mask_cmplt_epu8_mask(__mmask64 __u, __m512i __a, __m512i __b) {
return (__mmask64)__builtin_ia32_ucmpb512_mask((__v64qi)__a, (__v64qi)__b, 1,
__u);
}
static __inline__ __mmask32 __DEFAULT_FN_ATTRS
_mm512_cmplt_epi16_mask(__m512i __a, __m512i __b) {
return (__mmask32)__builtin_ia32_cmpw512_mask((__v32hi)__a, (__v32hi)__b, 1,
(__mmask32)-1);
}
static __inline__ __mmask32 __DEFAULT_FN_ATTRS
_mm512_mask_cmplt_epi16_mask(__mmask32 __u, __m512i __a, __m512i __b) {
return (__mmask32)__builtin_ia32_cmpw512_mask((__v32hi)__a, (__v32hi)__b, 1,
__u);
}
static __inline__ __mmask32 __DEFAULT_FN_ATTRS
_mm512_cmplt_epu16_mask(__m512i __a, __m512i __b) {
return (__mmask32)__builtin_ia32_ucmpw512_mask((__v32hi)__a, (__v32hi)__b, 1,
(__mmask32)-1);
}
static __inline__ __mmask32 __DEFAULT_FN_ATTRS
_mm512_mask_cmplt_epu16_mask(__mmask32 __u, __m512i __a, __m512i __b) {
return (__mmask32)__builtin_ia32_ucmpw512_mask((__v32hi)__a, (__v32hi)__b, 1,
__u);
}
static __inline__ __mmask64 __DEFAULT_FN_ATTRS
_mm512_cmpneq_epi8_mask(__m512i __a, __m512i __b) {
return (__mmask64)__builtin_ia32_cmpb512_mask((__v64qi)__a, (__v64qi)__b, 4,
(__mmask64)-1);
}
static __inline__ __mmask64 __DEFAULT_FN_ATTRS
_mm512_mask_cmpneq_epi8_mask(__mmask64 __u, __m512i __a, __m512i __b) {
return (__mmask64)__builtin_ia32_cmpb512_mask((__v64qi)__a, (__v64qi)__b, 4,
__u);
}
static __inline__ __mmask64 __DEFAULT_FN_ATTRS
_mm512_cmpneq_epu8_mask(__m512i __a, __m512i __b) {
return (__mmask64)__builtin_ia32_ucmpb512_mask((__v64qi)__a, (__v64qi)__b, 4,
(__mmask64)-1);
}
static __inline__ __mmask64 __DEFAULT_FN_ATTRS
_mm512_mask_cmpneq_epu8_mask(__mmask64 __u, __m512i __a, __m512i __b) {
return (__mmask64)__builtin_ia32_ucmpb512_mask((__v64qi)__a, (__v64qi)__b, 4,
__u);
}
static __inline__ __mmask32 __DEFAULT_FN_ATTRS
_mm512_cmpneq_epi16_mask(__m512i __a, __m512i __b) {
return (__mmask32)__builtin_ia32_cmpw512_mask((__v32hi)__a, (__v32hi)__b, 4,
(__mmask32)-1);
}
static __inline__ __mmask32 __DEFAULT_FN_ATTRS
_mm512_mask_cmpneq_epi16_mask(__mmask32 __u, __m512i __a, __m512i __b) {
return (__mmask32)__builtin_ia32_cmpw512_mask((__v32hi)__a, (__v32hi)__b, 4,
__u);
}
static __inline__ __mmask32 __DEFAULT_FN_ATTRS
_mm512_cmpneq_epu16_mask(__m512i __a, __m512i __b) {
return (__mmask32)__builtin_ia32_ucmpw512_mask((__v32hi)__a, (__v32hi)__b, 4,
(__mmask32)-1);
}
static __inline__ __mmask32 __DEFAULT_FN_ATTRS
_mm512_mask_cmpneq_epu16_mask(__mmask32 __u, __m512i __a, __m512i __b) {
return (__mmask32)__builtin_ia32_ucmpw512_mask((__v32hi)__a, (__v32hi)__b, 4,
__u);
}
#define _mm512_cmpeq_epu16_mask(A, B) \
_mm512_cmp_epu16_mask((A), (B), _MM_CMPINT_EQ)
#define _mm512_mask_cmpeq_epu16_mask(k, A, B) \
_mm512_mask_cmp_epu16_mask((k), (A), (B), _MM_CMPINT_EQ)
#define _mm512_cmpge_epu16_mask(A, B) \
_mm512_cmp_epu16_mask((A), (B), _MM_CMPINT_GE)
#define _mm512_mask_cmpge_epu16_mask(k, A, B) \
_mm512_mask_cmp_epu16_mask((k), (A), (B), _MM_CMPINT_GE)
#define _mm512_cmpgt_epu16_mask(A, B) \
_mm512_cmp_epu16_mask((A), (B), _MM_CMPINT_GT)
#define _mm512_mask_cmpgt_epu16_mask(k, A, B) \
_mm512_mask_cmp_epu16_mask((k), (A), (B), _MM_CMPINT_GT)
#define _mm512_cmple_epu16_mask(A, B) \
_mm512_cmp_epu16_mask((A), (B), _MM_CMPINT_LE)
#define _mm512_mask_cmple_epu16_mask(k, A, B) \
_mm512_mask_cmp_epu16_mask((k), (A), (B), _MM_CMPINT_LE)
#define _mm512_cmplt_epu16_mask(A, B) \
_mm512_cmp_epu16_mask((A), (B), _MM_CMPINT_LT)
#define _mm512_mask_cmplt_epu16_mask(k, A, B) \
_mm512_mask_cmp_epu16_mask((k), (A), (B), _MM_CMPINT_LT)
#define _mm512_cmpneq_epu16_mask(A, B) \
_mm512_cmp_epu16_mask((A), (B), _MM_CMPINT_NE)
#define _mm512_mask_cmpneq_epu16_mask(k, A, B) \
_mm512_mask_cmp_epu16_mask((k), (A), (B), _MM_CMPINT_NE)
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_add_epi8 (__m512i __A, __m512i __B) {
@@ -706,57 +558,55 @@ _mm512_maskz_adds_epu16 (__mmask32 __U, __m512i __A, __m512i __B)
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_avg_epu8 (__m512i __A, __m512i __B)
{
return (__m512i) __builtin_ia32_pavgb512_mask ((__v64qi) __A,
(__v64qi) __B,
(__v64qi) _mm512_setzero_qi(),
(__mmask64) -1);
typedef unsigned short __v64hu __attribute__((__vector_size__(128)));
return (__m512i)__builtin_convertvector(
((__builtin_convertvector((__v64qu) __A, __v64hu) +
__builtin_convertvector((__v64qu) __B, __v64hu)) + 1)
>> 1, __v64qu);
}
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_mask_avg_epu8 (__m512i __W, __mmask64 __U, __m512i __A,
__m512i __B)
{
return (__m512i) __builtin_ia32_pavgb512_mask ((__v64qi) __A,
(__v64qi) __B,
(__v64qi) __W,
(__mmask64) __U);
return (__m512i)__builtin_ia32_selectb_512((__mmask64)__U,
(__v64qi)_mm512_avg_epu8(__A, __B),
(__v64qi)__W);
}
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_maskz_avg_epu8 (__mmask64 __U, __m512i __A, __m512i __B)
{
return (__m512i) __builtin_ia32_pavgb512_mask ((__v64qi) __A,
(__v64qi) __B,
(__v64qi) _mm512_setzero_qi(),
(__mmask64) __U);
return (__m512i)__builtin_ia32_selectb_512((__mmask64)__U,
(__v64qi)_mm512_avg_epu8(__A, __B),
(__v64qi)_mm512_setzero_qi());
}
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_avg_epu16 (__m512i __A, __m512i __B)
{
return (__m512i) __builtin_ia32_pavgw512_mask ((__v32hi) __A,
(__v32hi) __B,
(__v32hi) _mm512_setzero_hi(),
(__mmask32) -1);
typedef unsigned int __v32su __attribute__((__vector_size__(128)));
return (__m512i)__builtin_convertvector(
((__builtin_convertvector((__v32hu) __A, __v32su) +
__builtin_convertvector((__v32hu) __B, __v32su)) + 1)
>> 1, __v32hu);
}
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_mask_avg_epu16 (__m512i __W, __mmask32 __U, __m512i __A,
__m512i __B)
{
return (__m512i) __builtin_ia32_pavgw512_mask ((__v32hi) __A,
(__v32hi) __B,
(__v32hi) __W,
(__mmask32) __U);
return (__m512i)__builtin_ia32_selectw_512((__mmask32)__U,
(__v32hi)_mm512_avg_epu16(__A, __B),
(__v32hi)__W);
}
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_maskz_avg_epu16 (__mmask32 __U, __m512i __A, __m512i __B)
{
return (__m512i) __builtin_ia32_pavgw512_mask ((__v32hi) __A,
(__v32hi) __B,
(__v32hi) _mm512_setzero_hi(),
(__mmask32) __U);
return (__m512i)__builtin_ia32_selectw_512((__mmask32)__U,
(__v32hi)_mm512_avg_epu16(__A, __B),
(__v32hi) _mm512_setzero_hi());
}
static __inline__ __m512i __DEFAULT_FN_ATTRS
@@ -1543,46 +1393,6 @@ _mm512_maskz_cvtepu8_epi16(__mmask32 __U, __m256i __A)
}
#define _mm512_cmp_epi8_mask(a, b, p) __extension__ ({ \
(__mmask64)__builtin_ia32_cmpb512_mask((__v64qi)(__m512i)(a), \
(__v64qi)(__m512i)(b), (int)(p), \
(__mmask64)-1); })
#define _mm512_mask_cmp_epi8_mask(m, a, b, p) __extension__ ({ \
(__mmask64)__builtin_ia32_cmpb512_mask((__v64qi)(__m512i)(a), \
(__v64qi)(__m512i)(b), (int)(p), \
(__mmask64)(m)); })
#define _mm512_cmp_epu8_mask(a, b, p) __extension__ ({ \
(__mmask64)__builtin_ia32_ucmpb512_mask((__v64qi)(__m512i)(a), \
(__v64qi)(__m512i)(b), (int)(p), \
(__mmask64)-1); })
#define _mm512_mask_cmp_epu8_mask(m, a, b, p) __extension__ ({ \
(__mmask64)__builtin_ia32_ucmpb512_mask((__v64qi)(__m512i)(a), \
(__v64qi)(__m512i)(b), (int)(p), \
(__mmask64)(m)); })
#define _mm512_cmp_epi16_mask(a, b, p) __extension__ ({ \
(__mmask32)__builtin_ia32_cmpw512_mask((__v32hi)(__m512i)(a), \
(__v32hi)(__m512i)(b), (int)(p), \
(__mmask32)-1); })
#define _mm512_mask_cmp_epi16_mask(m, a, b, p) __extension__ ({ \
(__mmask32)__builtin_ia32_cmpw512_mask((__v32hi)(__m512i)(a), \
(__v32hi)(__m512i)(b), (int)(p), \
(__mmask32)(m)); })
#define _mm512_cmp_epu16_mask(a, b, p) __extension__ ({ \
(__mmask32)__builtin_ia32_ucmpw512_mask((__v32hi)(__m512i)(a), \
(__v32hi)(__m512i)(b), (int)(p), \
(__mmask32)-1); })
#define _mm512_mask_cmp_epu16_mask(m, a, b, p) __extension__ ({ \
(__mmask32)__builtin_ia32_ucmpw512_mask((__v32hi)(__m512i)(a), \
(__v32hi)(__m512i)(b), (int)(p), \
(__mmask32)(m)); })
#define _mm512_shufflehi_epi16(A, imm) __extension__ ({ \
(__m512i)__builtin_shufflevector((__v32hi)(__m512i)(A), \
(__v32hi)_mm512_undefined_epi32(), \
@@ -2028,32 +1838,29 @@ _mm512_maskz_mov_epi8 (__mmask64 __U, __m512i __A)
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_mask_set1_epi8 (__m512i __O, __mmask64 __M, char __A)
{
return (__m512i) __builtin_ia32_pbroadcastb512_gpr_mask (__A,
(__v64qi) __O,
__M);
return (__m512i) __builtin_ia32_selectb_512(__M,
(__v64qi)_mm512_set1_epi8(__A),
(__v64qi) __O);
}
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_maskz_set1_epi8 (__mmask64 __M, char __A)
{
return (__m512i) __builtin_ia32_pbroadcastb512_gpr_mask (__A,
(__v64qi)
_mm512_setzero_qi(),
__M);
return (__m512i) __builtin_ia32_selectb_512(__M,
(__v64qi) _mm512_set1_epi8(__A),
(__v64qi) _mm512_setzero_si512());
}
static __inline__ __mmask64 __DEFAULT_FN_ATTRS
_mm512_kunpackd (__mmask64 __A, __mmask64 __B)
{
return (__mmask64) __builtin_ia32_kunpckdi ((__mmask64) __A,
(__mmask64) __B);
return (__mmask64) (( __A & 0xFFFFFFFF) | ( __B << 32));
}
static __inline__ __mmask32 __DEFAULT_FN_ATTRS
_mm512_kunpackw (__mmask32 __A, __mmask32 __B)
{
return (__mmask32) __builtin_ia32_kunpcksi ((__mmask32) __A,
(__mmask32) __B);
return (__mmask32) (( __A & 0xFFFF) | ( __B << 16));
}
static __inline__ __m512i __DEFAULT_FN_ATTRS
@@ -2108,61 +1915,56 @@ _mm512_mask_storeu_epi8 (void *__P, __mmask64 __U, __m512i __A)
static __inline__ __mmask64 __DEFAULT_FN_ATTRS
_mm512_test_epi8_mask (__m512i __A, __m512i __B)
{
return (__mmask64) __builtin_ia32_ptestmb512 ((__v64qi) __A,
(__v64qi) __B,
(__mmask64) -1);
return _mm512_cmpneq_epi8_mask (_mm512_and_epi32 (__A, __B),
_mm512_setzero_qi());
}
static __inline__ __mmask64 __DEFAULT_FN_ATTRS
_mm512_mask_test_epi8_mask (__mmask64 __U, __m512i __A, __m512i __B)
{
return (__mmask64) __builtin_ia32_ptestmb512 ((__v64qi) __A,
(__v64qi) __B, __U);
return _mm512_mask_cmpneq_epi8_mask (__U, _mm512_and_epi32 (__A, __B),
_mm512_setzero_qi());
}
static __inline__ __mmask32 __DEFAULT_FN_ATTRS
_mm512_test_epi16_mask (__m512i __A, __m512i __B)
{
return (__mmask32) __builtin_ia32_ptestmw512 ((__v32hi) __A,
(__v32hi) __B,
(__mmask32) -1);
return _mm512_cmpneq_epi16_mask (_mm512_and_epi32 (__A, __B),
_mm512_setzero_qi());
}
static __inline__ __mmask32 __DEFAULT_FN_ATTRS
_mm512_mask_test_epi16_mask (__mmask32 __U, __m512i __A, __m512i __B)
{
return (__mmask32) __builtin_ia32_ptestmw512 ((__v32hi) __A,
(__v32hi) __B, __U);
return _mm512_mask_cmpneq_epi16_mask (__U, _mm512_and_epi32 (__A, __B),
_mm512_setzero_qi());
}
static __inline__ __mmask64 __DEFAULT_FN_ATTRS
_mm512_testn_epi8_mask (__m512i __A, __m512i __B)
{
return (__mmask64) __builtin_ia32_ptestnmb512 ((__v64qi) __A,
(__v64qi) __B,
(__mmask64) -1);
return _mm512_cmpeq_epi8_mask (_mm512_and_epi32 (__A, __B), _mm512_setzero_qi());
}
static __inline__ __mmask64 __DEFAULT_FN_ATTRS
_mm512_mask_testn_epi8_mask (__mmask64 __U, __m512i __A, __m512i __B)
{
return (__mmask64) __builtin_ia32_ptestnmb512 ((__v64qi) __A,
(__v64qi) __B, __U);
return _mm512_mask_cmpeq_epi8_mask (__U, _mm512_and_epi32 (__A, __B),
_mm512_setzero_qi());
}
static __inline__ __mmask32 __DEFAULT_FN_ATTRS
_mm512_testn_epi16_mask (__m512i __A, __m512i __B)
{
return (__mmask32) __builtin_ia32_ptestnmw512 ((__v32hi) __A,
(__v32hi) __B,
(__mmask32) -1);
return _mm512_cmpeq_epi16_mask (_mm512_and_epi32 (__A, __B),
_mm512_setzero_qi());
}
static __inline__ __mmask32 __DEFAULT_FN_ATTRS
_mm512_mask_testn_epi16_mask (__mmask32 __U, __m512i __A, __m512i __B)
{
return (__mmask32) __builtin_ia32_ptestnmw512 ((__v32hi) __A,
(__v32hi) __B, __U);
return _mm512_mask_cmpeq_epi16_mask (__U, _mm512_and_epi32 (__A, __B),
_mm512_setzero_qi());
}
static __inline__ __mmask64 __DEFAULT_FN_ATTRS
@@ -2219,17 +2021,17 @@ _mm512_maskz_broadcastb_epi8 (__mmask64 __M, __m128i __A)
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_mask_set1_epi16 (__m512i __O, __mmask32 __M, short __A)
{
return (__m512i) __builtin_ia32_pbroadcastw512_gpr_mask (__A,
(__v32hi) __O,
__M);
return (__m512i) __builtin_ia32_selectw_512(__M,
(__v32hi) _mm512_set1_epi16(__A),
(__v32hi) __O);
}
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_maskz_set1_epi16 (__mmask32 __M, short __A)
{
return (__m512i) __builtin_ia32_pbroadcastw512_gpr_mask (__A,
(__v32hi) _mm512_setzero_hi(),
__M);
return (__m512i) __builtin_ia32_selectw_512(__M,
(__v32hi) _mm512_set1_epi16(__A),
(__v32hi) _mm512_setzero_si512());
}
static __inline__ __m512i __DEFAULT_FN_ATTRS

View File

@@ -130,13 +130,14 @@ _mm512_maskz_lzcnt_epi64 (__mmask8 __U, __m512i __A)
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_broadcastmb_epi64 (__mmask8 __A)
{
return (__m512i) __builtin_ia32_broadcastmb512 (__A);
return (__m512i) _mm512_set1_epi64((long long) __A);
}
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_broadcastmw_epi32 (__mmask16 __A)
{
return (__m512i) __builtin_ia32_broadcastmw512 (__A);
return (__m512i) _mm512_set1_epi32((int) __A);
}
#undef __DEFAULT_FN_ATTRS

View File

@@ -973,25 +973,26 @@ _mm512_movepi64_mask (__m512i __A)
static __inline__ __m512 __DEFAULT_FN_ATTRS
_mm512_broadcast_f32x2 (__m128 __A)
{
return (__m512) __builtin_ia32_broadcastf32x2_512_mask ((__v4sf) __A,
(__v16sf)_mm512_undefined_ps(),
(__mmask16) -1);
return (__m512)__builtin_shufflevector((__v4sf)__A,
(__v4sf)_mm_undefined_ps(),
0, 1, 0, 1, 0, 1, 0, 1,
0, 1, 0, 1, 0, 1, 0, 1);
}
static __inline__ __m512 __DEFAULT_FN_ATTRS
_mm512_mask_broadcast_f32x2 (__m512 __O, __mmask16 __M, __m128 __A)
{
return (__m512) __builtin_ia32_broadcastf32x2_512_mask ((__v4sf) __A,
(__v16sf)
__O, __M);
return (__m512)__builtin_ia32_selectps_512((__mmask16)__M,
(__v16sf)_mm512_broadcast_f32x2(__A),
(__v16sf)__O);
}
static __inline__ __m512 __DEFAULT_FN_ATTRS
_mm512_maskz_broadcast_f32x2 (__mmask16 __M, __m128 __A)
{
return (__m512) __builtin_ia32_broadcastf32x2_512_mask ((__v4sf) __A,
(__v16sf)_mm512_setzero_ps (),
__M);
return (__m512)__builtin_ia32_selectps_512((__mmask16)__M,
(__v16sf)_mm512_broadcast_f32x2(__A),
(__v16sf)_mm512_setzero_ps());
}
static __inline__ __m512 __DEFAULT_FN_ATTRS
@@ -1044,25 +1045,26 @@ _mm512_maskz_broadcast_f64x2(__mmask8 __M, __m128d __A)
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_broadcast_i32x2 (__m128i __A)
{
return (__m512i) __builtin_ia32_broadcasti32x2_512_mask ((__v4si) __A,
(__v16si)_mm512_setzero_si512(),
(__mmask16) -1);
return (__m512i)__builtin_shufflevector((__v4si)__A,
(__v4si)_mm_undefined_si128(),
0, 1, 0, 1, 0, 1, 0, 1,
0, 1, 0, 1, 0, 1, 0, 1);
}
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_mask_broadcast_i32x2 (__m512i __O, __mmask16 __M, __m128i __A)
{
return (__m512i) __builtin_ia32_broadcasti32x2_512_mask ((__v4si) __A,
(__v16si)
__O, __M);
return (__m512i)__builtin_ia32_selectd_512((__mmask16)__M,
(__v16si)_mm512_broadcast_i32x2(__A),
(__v16si)__O);
}
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_maskz_broadcast_i32x2 (__mmask16 __M, __m128i __A)
{
return (__m512i) __builtin_ia32_broadcasti32x2_512_mask ((__v4si) __A,
(__v16si)_mm512_setzero_si512 (),
__M);
return (__m512i)__builtin_ia32_selectd_512((__mmask16)__M,
(__v16si)_mm512_broadcast_i32x2(__A),
(__v16si)_mm512_setzero_si512());
}
static __inline__ __m512i __DEFAULT_FN_ATTRS

View File

@@ -258,30 +258,6 @@ _mm512_maskz_broadcastq_epi64 (__mmask8 __M, __m128i __A)
(__v8di) _mm512_setzero_si512());
}
static __inline __m512i __DEFAULT_FN_ATTRS
_mm512_maskz_set1_epi32(__mmask16 __M, int __A)
{
return (__m512i) __builtin_ia32_pbroadcastd512_gpr_mask (__A,
(__v16si)
_mm512_setzero_si512 (),
__M);
}
static __inline __m512i __DEFAULT_FN_ATTRS
_mm512_maskz_set1_epi64(__mmask8 __M, long long __A)
{
#ifdef __x86_64__
return (__m512i) __builtin_ia32_pbroadcastq512_gpr_mask (__A,
(__v8di)
_mm512_setzero_si512 (),
__M);
#else
return (__m512i) __builtin_ia32_pbroadcastq512_mem_mask (__A,
(__v8di)
_mm512_setzero_si512 (),
__M);
#endif
}
static __inline __m512 __DEFAULT_FN_ATTRS
_mm512_setzero_ps(void)
@@ -340,12 +316,30 @@ _mm512_set1_epi32(int __s)
__s, __s, __s, __s, __s, __s, __s, __s };
}
static __inline __m512i __DEFAULT_FN_ATTRS
_mm512_maskz_set1_epi32(__mmask16 __M, int __A)
{
return (__m512i)__builtin_ia32_selectd_512(__M,
(__v16si)_mm512_set1_epi32(__A),
(__v16si)_mm512_setzero_si512());
}
static __inline __m512i __DEFAULT_FN_ATTRS
_mm512_set1_epi64(long long __d)
{
return (__m512i)(__v8di){ __d, __d, __d, __d, __d, __d, __d, __d };
}
#ifdef __x86_64__
static __inline __m512i __DEFAULT_FN_ATTRS
_mm512_maskz_set1_epi64(__mmask8 __M, long long __A)
{
return (__m512i)__builtin_ia32_selectq_512(__M,
(__v8di)_mm512_set1_epi64(__A),
(__v8di)_mm512_setzero_si512());
}
#endif
static __inline__ __m512 __DEFAULT_FN_ATTRS
_mm512_broadcastss_ps(__m128 __A)
{
@@ -4549,37 +4543,6 @@ _mm512_maskz_unpacklo_epi64 (__mmask8 __U, __m512i __A, __m512i __B)
(__v8di)_mm512_setzero_si512());
}
/* Bit Test */
static __inline __mmask16 __DEFAULT_FN_ATTRS
_mm512_test_epi32_mask(__m512i __A, __m512i __B)
{
return (__mmask16) __builtin_ia32_ptestmd512 ((__v16si) __A,
(__v16si) __B,
(__mmask16) -1);
}
static __inline__ __mmask16 __DEFAULT_FN_ATTRS
_mm512_mask_test_epi32_mask (__mmask16 __U, __m512i __A, __m512i __B)
{
return (__mmask16) __builtin_ia32_ptestmd512 ((__v16si) __A,
(__v16si) __B, __U);
}
static __inline __mmask8 __DEFAULT_FN_ATTRS
_mm512_test_epi64_mask(__m512i __A, __m512i __B)
{
return (__mmask8) __builtin_ia32_ptestmq512 ((__v8di) __A,
(__v8di) __B,
(__mmask8) -1);
}
static __inline__ __mmask8 __DEFAULT_FN_ATTRS
_mm512_mask_test_epi64_mask (__mmask8 __U, __m512i __A, __m512i __B)
{
return (__mmask8) __builtin_ia32_ptestmq512 ((__v8di) __A, (__v8di) __B, __U);
}
/* SIMD load ops */
@@ -4850,293 +4813,105 @@ _mm512_knot(__mmask16 __M)
/* Integer compare */
static __inline__ __mmask16 __DEFAULT_FN_ATTRS
_mm512_cmpeq_epi32_mask(__m512i __a, __m512i __b) {
return (__mmask16)__builtin_ia32_pcmpeqd512_mask((__v16si)__a, (__v16si)__b,
(__mmask16)-1);
}
#define _mm512_cmpeq_epi32_mask(A, B) \
_mm512_cmp_epi32_mask((A), (B), _MM_CMPINT_EQ)
#define _mm512_mask_cmpeq_epi32_mask(k, A, B) \
_mm512_mask_cmp_epi32_mask((k), (A), (B), _MM_CMPINT_EQ)
#define _mm512_cmpge_epi32_mask(A, B) \
_mm512_cmp_epi32_mask((A), (B), _MM_CMPINT_GE)
#define _mm512_mask_cmpge_epi32_mask(k, A, B) \
_mm512_mask_cmp_epi32_mask((k), (A), (B), _MM_CMPINT_GE)
#define _mm512_cmpgt_epi32_mask(A, B) \
_mm512_cmp_epi32_mask((A), (B), _MM_CMPINT_GT)
#define _mm512_mask_cmpgt_epi32_mask(k, A, B) \
_mm512_mask_cmp_epi32_mask((k), (A), (B), _MM_CMPINT_GT)
#define _mm512_cmple_epi32_mask(A, B) \
_mm512_cmp_epi32_mask((A), (B), _MM_CMPINT_LE)
#define _mm512_mask_cmple_epi32_mask(k, A, B) \
_mm512_mask_cmp_epi32_mask((k), (A), (B), _MM_CMPINT_LE)
#define _mm512_cmplt_epi32_mask(A, B) \
_mm512_cmp_epi32_mask((A), (B), _MM_CMPINT_LT)
#define _mm512_mask_cmplt_epi32_mask(k, A, B) \
_mm512_mask_cmp_epi32_mask((k), (A), (B), _MM_CMPINT_LT)
#define _mm512_cmpneq_epi32_mask(A, B) \
_mm512_cmp_epi32_mask((A), (B), _MM_CMPINT_NE)
#define _mm512_mask_cmpneq_epi32_mask(k, A, B) \
_mm512_mask_cmp_epi32_mask((k), (A), (B), _MM_CMPINT_NE)
static __inline__ __mmask16 __DEFAULT_FN_ATTRS
_mm512_mask_cmpeq_epi32_mask(__mmask16 __u, __m512i __a, __m512i __b) {
return (__mmask16)__builtin_ia32_pcmpeqd512_mask((__v16si)__a, (__v16si)__b,
__u);
}
#define _mm512_cmpeq_epu32_mask(A, B) \
_mm512_cmp_epu32_mask((A), (B), _MM_CMPINT_EQ)
#define _mm512_mask_cmpeq_epu32_mask(k, A, B) \
_mm512_mask_cmp_epu32_mask((k), (A), (B), _MM_CMPINT_EQ)
#define _mm512_cmpge_epu32_mask(A, B) \
_mm512_cmp_epu32_mask((A), (B), _MM_CMPINT_GE)
#define _mm512_mask_cmpge_epu32_mask(k, A, B) \
_mm512_mask_cmp_epu32_mask((k), (A), (B), _MM_CMPINT_GE)
#define _mm512_cmpgt_epu32_mask(A, B) \
_mm512_cmp_epu32_mask((A), (B), _MM_CMPINT_GT)
#define _mm512_mask_cmpgt_epu32_mask(k, A, B) \
_mm512_mask_cmp_epu32_mask((k), (A), (B), _MM_CMPINT_GT)
#define _mm512_cmple_epu32_mask(A, B) \
_mm512_cmp_epu32_mask((A), (B), _MM_CMPINT_LE)
#define _mm512_mask_cmple_epu32_mask(k, A, B) \
_mm512_mask_cmp_epu32_mask((k), (A), (B), _MM_CMPINT_LE)
#define _mm512_cmplt_epu32_mask(A, B) \
_mm512_cmp_epu32_mask((A), (B), _MM_CMPINT_LT)
#define _mm512_mask_cmplt_epu32_mask(k, A, B) \
_mm512_mask_cmp_epu32_mask((k), (A), (B), _MM_CMPINT_LT)
#define _mm512_cmpneq_epu32_mask(A, B) \
_mm512_cmp_epu32_mask((A), (B), _MM_CMPINT_NE)
#define _mm512_mask_cmpneq_epu32_mask(k, A, B) \
_mm512_mask_cmp_epu32_mask((k), (A), (B), _MM_CMPINT_NE)
static __inline__ __mmask16 __DEFAULT_FN_ATTRS
_mm512_cmpeq_epu32_mask(__m512i __a, __m512i __b) {
return (__mmask16)__builtin_ia32_ucmpd512_mask((__v16si)__a, (__v16si)__b, 0,
(__mmask16)-1);
}
#define _mm512_cmpeq_epi64_mask(A, B) \
_mm512_cmp_epi64_mask((A), (B), _MM_CMPINT_EQ)
#define _mm512_mask_cmpeq_epi64_mask(k, A, B) \
_mm512_mask_cmp_epi64_mask((k), (A), (B), _MM_CMPINT_EQ)
#define _mm512_cmpge_epi64_mask(A, B) \
_mm512_cmp_epi64_mask((A), (B), _MM_CMPINT_GE)
#define _mm512_mask_cmpge_epi64_mask(k, A, B) \
_mm512_mask_cmp_epi64_mask((k), (A), (B), _MM_CMPINT_GE)
#define _mm512_cmpgt_epi64_mask(A, B) \
_mm512_cmp_epi64_mask((A), (B), _MM_CMPINT_GT)
#define _mm512_mask_cmpgt_epi64_mask(k, A, B) \
_mm512_mask_cmp_epi64_mask((k), (A), (B), _MM_CMPINT_GT)
#define _mm512_cmple_epi64_mask(A, B) \
_mm512_cmp_epi64_mask((A), (B), _MM_CMPINT_LE)
#define _mm512_mask_cmple_epi64_mask(k, A, B) \
_mm512_mask_cmp_epi64_mask((k), (A), (B), _MM_CMPINT_LE)
#define _mm512_cmplt_epi64_mask(A, B) \
_mm512_cmp_epi64_mask((A), (B), _MM_CMPINT_LT)
#define _mm512_mask_cmplt_epi64_mask(k, A, B) \
_mm512_mask_cmp_epi64_mask((k), (A), (B), _MM_CMPINT_LT)
#define _mm512_cmpneq_epi64_mask(A, B) \
_mm512_cmp_epi64_mask((A), (B), _MM_CMPINT_NE)
#define _mm512_mask_cmpneq_epi64_mask(k, A, B) \
_mm512_mask_cmp_epi64_mask((k), (A), (B), _MM_CMPINT_NE)
static __inline__ __mmask16 __DEFAULT_FN_ATTRS
_mm512_mask_cmpeq_epu32_mask(__mmask16 __u, __m512i __a, __m512i __b) {
return (__mmask16)__builtin_ia32_ucmpd512_mask((__v16si)__a, (__v16si)__b, 0,
__u);
}
static __inline__ __mmask8 __DEFAULT_FN_ATTRS
_mm512_mask_cmpeq_epi64_mask(__mmask8 __u, __m512i __a, __m512i __b) {
return (__mmask8)__builtin_ia32_pcmpeqq512_mask((__v8di)__a, (__v8di)__b,
__u);
}
static __inline__ __mmask8 __DEFAULT_FN_ATTRS
_mm512_cmpeq_epi64_mask(__m512i __a, __m512i __b) {
return (__mmask8)__builtin_ia32_pcmpeqq512_mask((__v8di)__a, (__v8di)__b,
(__mmask8)-1);
}
static __inline__ __mmask8 __DEFAULT_FN_ATTRS
_mm512_cmpeq_epu64_mask(__m512i __a, __m512i __b) {
return (__mmask8)__builtin_ia32_ucmpq512_mask((__v8di)__a, (__v8di)__b, 0,
(__mmask8)-1);
}
static __inline__ __mmask8 __DEFAULT_FN_ATTRS
_mm512_mask_cmpeq_epu64_mask(__mmask8 __u, __m512i __a, __m512i __b) {
return (__mmask8)__builtin_ia32_ucmpq512_mask((__v8di)__a, (__v8di)__b, 0,
__u);
}
static __inline__ __mmask16 __DEFAULT_FN_ATTRS
_mm512_cmpge_epi32_mask(__m512i __a, __m512i __b) {
return (__mmask16)__builtin_ia32_cmpd512_mask((__v16si)__a, (__v16si)__b, 5,
(__mmask16)-1);
}
static __inline__ __mmask16 __DEFAULT_FN_ATTRS
_mm512_mask_cmpge_epi32_mask(__mmask16 __u, __m512i __a, __m512i __b) {
return (__mmask16)__builtin_ia32_cmpd512_mask((__v16si)__a, (__v16si)__b, 5,
__u);
}
static __inline__ __mmask16 __DEFAULT_FN_ATTRS
_mm512_cmpge_epu32_mask(__m512i __a, __m512i __b) {
return (__mmask16)__builtin_ia32_ucmpd512_mask((__v16si)__a, (__v16si)__b, 5,
(__mmask16)-1);
}
static __inline__ __mmask16 __DEFAULT_FN_ATTRS
_mm512_mask_cmpge_epu32_mask(__mmask16 __u, __m512i __a, __m512i __b) {
return (__mmask16)__builtin_ia32_ucmpd512_mask((__v16si)__a, (__v16si)__b, 5,
__u);
}
static __inline__ __mmask8 __DEFAULT_FN_ATTRS
_mm512_cmpge_epi64_mask(__m512i __a, __m512i __b) {
return (__mmask8)__builtin_ia32_cmpq512_mask((__v8di)__a, (__v8di)__b, 5,
(__mmask8)-1);
}
static __inline__ __mmask8 __DEFAULT_FN_ATTRS
_mm512_mask_cmpge_epi64_mask(__mmask8 __u, __m512i __a, __m512i __b) {
return (__mmask8)__builtin_ia32_cmpq512_mask((__v8di)__a, (__v8di)__b, 5,
__u);
}
static __inline__ __mmask8 __DEFAULT_FN_ATTRS
_mm512_cmpge_epu64_mask(__m512i __a, __m512i __b) {
return (__mmask8)__builtin_ia32_ucmpq512_mask((__v8di)__a, (__v8di)__b, 5,
(__mmask8)-1);
}
static __inline__ __mmask8 __DEFAULT_FN_ATTRS
_mm512_mask_cmpge_epu64_mask(__mmask8 __u, __m512i __a, __m512i __b) {
return (__mmask8)__builtin_ia32_ucmpq512_mask((__v8di)__a, (__v8di)__b, 5,
__u);
}
static __inline__ __mmask16 __DEFAULT_FN_ATTRS
_mm512_cmpgt_epi32_mask(__m512i __a, __m512i __b) {
return (__mmask16)__builtin_ia32_pcmpgtd512_mask((__v16si)__a, (__v16si)__b,
(__mmask16)-1);
}
static __inline__ __mmask16 __DEFAULT_FN_ATTRS
_mm512_mask_cmpgt_epi32_mask(__mmask16 __u, __m512i __a, __m512i __b) {
return (__mmask16)__builtin_ia32_pcmpgtd512_mask((__v16si)__a, (__v16si)__b,
__u);
}
static __inline__ __mmask16 __DEFAULT_FN_ATTRS
_mm512_cmpgt_epu32_mask(__m512i __a, __m512i __b) {
return (__mmask16)__builtin_ia32_ucmpd512_mask((__v16si)__a, (__v16si)__b, 6,
(__mmask16)-1);
}
static __inline__ __mmask16 __DEFAULT_FN_ATTRS
_mm512_mask_cmpgt_epu32_mask(__mmask16 __u, __m512i __a, __m512i __b) {
return (__mmask16)__builtin_ia32_ucmpd512_mask((__v16si)__a, (__v16si)__b, 6,
__u);
}
static __inline__ __mmask8 __DEFAULT_FN_ATTRS
_mm512_mask_cmpgt_epi64_mask(__mmask8 __u, __m512i __a, __m512i __b) {
return (__mmask8)__builtin_ia32_pcmpgtq512_mask((__v8di)__a, (__v8di)__b,
__u);
}
static __inline__ __mmask8 __DEFAULT_FN_ATTRS
_mm512_cmpgt_epi64_mask(__m512i __a, __m512i __b) {
return (__mmask8)__builtin_ia32_pcmpgtq512_mask((__v8di)__a, (__v8di)__b,
(__mmask8)-1);
}
static __inline__ __mmask8 __DEFAULT_FN_ATTRS
_mm512_cmpgt_epu64_mask(__m512i __a, __m512i __b) {
return (__mmask8)__builtin_ia32_ucmpq512_mask((__v8di)__a, (__v8di)__b, 6,
(__mmask8)-1);
}
static __inline__ __mmask8 __DEFAULT_FN_ATTRS
_mm512_mask_cmpgt_epu64_mask(__mmask8 __u, __m512i __a, __m512i __b) {
return (__mmask8)__builtin_ia32_ucmpq512_mask((__v8di)__a, (__v8di)__b, 6,
__u);
}
static __inline__ __mmask16 __DEFAULT_FN_ATTRS
_mm512_cmple_epi32_mask(__m512i __a, __m512i __b) {
return (__mmask16)__builtin_ia32_cmpd512_mask((__v16si)__a, (__v16si)__b, 2,
(__mmask16)-1);
}
static __inline__ __mmask16 __DEFAULT_FN_ATTRS
_mm512_mask_cmple_epi32_mask(__mmask16 __u, __m512i __a, __m512i __b) {
return (__mmask16)__builtin_ia32_cmpd512_mask((__v16si)__a, (__v16si)__b, 2,
__u);
}
static __inline__ __mmask16 __DEFAULT_FN_ATTRS
_mm512_cmple_epu32_mask(__m512i __a, __m512i __b) {
return (__mmask16)__builtin_ia32_ucmpd512_mask((__v16si)__a, (__v16si)__b, 2,
(__mmask16)-1);
}
static __inline__ __mmask16 __DEFAULT_FN_ATTRS
_mm512_mask_cmple_epu32_mask(__mmask16 __u, __m512i __a, __m512i __b) {
return (__mmask16)__builtin_ia32_ucmpd512_mask((__v16si)__a, (__v16si)__b, 2,
__u);
}
static __inline__ __mmask8 __DEFAULT_FN_ATTRS
_mm512_cmple_epi64_mask(__m512i __a, __m512i __b) {
return (__mmask8)__builtin_ia32_cmpq512_mask((__v8di)__a, (__v8di)__b, 2,
(__mmask8)-1);
}
static __inline__ __mmask8 __DEFAULT_FN_ATTRS
_mm512_mask_cmple_epi64_mask(__mmask8 __u, __m512i __a, __m512i __b) {
return (__mmask8)__builtin_ia32_cmpq512_mask((__v8di)__a, (__v8di)__b, 2,
__u);
}
static __inline__ __mmask8 __DEFAULT_FN_ATTRS
_mm512_cmple_epu64_mask(__m512i __a, __m512i __b) {
return (__mmask8)__builtin_ia32_ucmpq512_mask((__v8di)__a, (__v8di)__b, 2,
(__mmask8)-1);
}
static __inline__ __mmask8 __DEFAULT_FN_ATTRS
_mm512_mask_cmple_epu64_mask(__mmask8 __u, __m512i __a, __m512i __b) {
return (__mmask8)__builtin_ia32_ucmpq512_mask((__v8di)__a, (__v8di)__b, 2,
__u);
}
static __inline__ __mmask16 __DEFAULT_FN_ATTRS
_mm512_cmplt_epi32_mask(__m512i __a, __m512i __b) {
return (__mmask16)__builtin_ia32_cmpd512_mask((__v16si)__a, (__v16si)__b, 1,
(__mmask16)-1);
}
static __inline__ __mmask16 __DEFAULT_FN_ATTRS
_mm512_mask_cmplt_epi32_mask(__mmask16 __u, __m512i __a, __m512i __b) {
return (__mmask16)__builtin_ia32_cmpd512_mask((__v16si)__a, (__v16si)__b, 1,
__u);
}
static __inline__ __mmask16 __DEFAULT_FN_ATTRS
_mm512_cmplt_epu32_mask(__m512i __a, __m512i __b) {
return (__mmask16)__builtin_ia32_ucmpd512_mask((__v16si)__a, (__v16si)__b, 1,
(__mmask16)-1);
}
static __inline__ __mmask16 __DEFAULT_FN_ATTRS
_mm512_mask_cmplt_epu32_mask(__mmask16 __u, __m512i __a, __m512i __b) {
return (__mmask16)__builtin_ia32_ucmpd512_mask((__v16si)__a, (__v16si)__b, 1,
__u);
}
static __inline__ __mmask8 __DEFAULT_FN_ATTRS
_mm512_cmplt_epi64_mask(__m512i __a, __m512i __b) {
return (__mmask8)__builtin_ia32_cmpq512_mask((__v8di)__a, (__v8di)__b, 1,
(__mmask8)-1);
}
static __inline__ __mmask8 __DEFAULT_FN_ATTRS
_mm512_mask_cmplt_epi64_mask(__mmask8 __u, __m512i __a, __m512i __b) {
return (__mmask8)__builtin_ia32_cmpq512_mask((__v8di)__a, (__v8di)__b, 1,
__u);
}
static __inline__ __mmask8 __DEFAULT_FN_ATTRS
_mm512_cmplt_epu64_mask(__m512i __a, __m512i __b) {
return (__mmask8)__builtin_ia32_ucmpq512_mask((__v8di)__a, (__v8di)__b, 1,
(__mmask8)-1);
}
static __inline__ __mmask8 __DEFAULT_FN_ATTRS
_mm512_mask_cmplt_epu64_mask(__mmask8 __u, __m512i __a, __m512i __b) {
return (__mmask8)__builtin_ia32_ucmpq512_mask((__v8di)__a, (__v8di)__b, 1,
__u);
}
static __inline__ __mmask16 __DEFAULT_FN_ATTRS
_mm512_cmpneq_epi32_mask(__m512i __a, __m512i __b) {
return (__mmask16)__builtin_ia32_cmpd512_mask((__v16si)__a, (__v16si)__b, 4,
(__mmask16)-1);
}
static __inline__ __mmask16 __DEFAULT_FN_ATTRS
_mm512_mask_cmpneq_epi32_mask(__mmask16 __u, __m512i __a, __m512i __b) {
return (__mmask16)__builtin_ia32_cmpd512_mask((__v16si)__a, (__v16si)__b, 4,
__u);
}
static __inline__ __mmask16 __DEFAULT_FN_ATTRS
_mm512_cmpneq_epu32_mask(__m512i __a, __m512i __b) {
return (__mmask16)__builtin_ia32_ucmpd512_mask((__v16si)__a, (__v16si)__b, 4,
(__mmask16)-1);
}
static __inline__ __mmask16 __DEFAULT_FN_ATTRS
_mm512_mask_cmpneq_epu32_mask(__mmask16 __u, __m512i __a, __m512i __b) {
return (__mmask16)__builtin_ia32_ucmpd512_mask((__v16si)__a, (__v16si)__b, 4,
__u);
}
static __inline__ __mmask8 __DEFAULT_FN_ATTRS
_mm512_cmpneq_epi64_mask(__m512i __a, __m512i __b) {
return (__mmask8)__builtin_ia32_cmpq512_mask((__v8di)__a, (__v8di)__b, 4,
(__mmask8)-1);
}
static __inline__ __mmask8 __DEFAULT_FN_ATTRS
_mm512_mask_cmpneq_epi64_mask(__mmask8 __u, __m512i __a, __m512i __b) {
return (__mmask8)__builtin_ia32_cmpq512_mask((__v8di)__a, (__v8di)__b, 4,
__u);
}
static __inline__ __mmask8 __DEFAULT_FN_ATTRS
_mm512_cmpneq_epu64_mask(__m512i __a, __m512i __b) {
return (__mmask8)__builtin_ia32_ucmpq512_mask((__v8di)__a, (__v8di)__b, 4,
(__mmask8)-1);
}
static __inline__ __mmask8 __DEFAULT_FN_ATTRS
_mm512_mask_cmpneq_epu64_mask(__mmask8 __u, __m512i __a, __m512i __b) {
return (__mmask8)__builtin_ia32_ucmpq512_mask((__v8di)__a, (__v8di)__b, 4,
__u);
}
#define _mm512_cmpeq_epu64_mask(A, B) \
_mm512_cmp_epu64_mask((A), (B), _MM_CMPINT_EQ)
#define _mm512_mask_cmpeq_epu64_mask(k, A, B) \
_mm512_mask_cmp_epu64_mask((k), (A), (B), _MM_CMPINT_EQ)
#define _mm512_cmpge_epu64_mask(A, B) \
_mm512_cmp_epu64_mask((A), (B), _MM_CMPINT_GE)
#define _mm512_mask_cmpge_epu64_mask(k, A, B) \
_mm512_mask_cmp_epu64_mask((k), (A), (B), _MM_CMPINT_GE)
#define _mm512_cmpgt_epu64_mask(A, B) \
_mm512_cmp_epu64_mask((A), (B), _MM_CMPINT_GT)
#define _mm512_mask_cmpgt_epu64_mask(k, A, B) \
_mm512_mask_cmp_epu64_mask((k), (A), (B), _MM_CMPINT_GT)
#define _mm512_cmple_epu64_mask(A, B) \
_mm512_cmp_epu64_mask((A), (B), _MM_CMPINT_LE)
#define _mm512_mask_cmple_epu64_mask(k, A, B) \
_mm512_mask_cmp_epu64_mask((k), (A), (B), _MM_CMPINT_LE)
#define _mm512_cmplt_epu64_mask(A, B) \
_mm512_cmp_epu64_mask((A), (B), _MM_CMPINT_LT)
#define _mm512_mask_cmplt_epu64_mask(k, A, B) \
_mm512_mask_cmp_epu64_mask((k), (A), (B), _MM_CMPINT_LT)
#define _mm512_cmpneq_epu64_mask(A, B) \
_mm512_cmp_epu64_mask((A), (B), _MM_CMPINT_NE)
#define _mm512_mask_cmpneq_epu64_mask(k, A, B) \
_mm512_mask_cmp_epu64_mask((k), (A), (B), _MM_CMPINT_NE)
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_cvtepi8_epi32(__m128i __A)
@@ -6803,35 +6578,6 @@ _mm512_maskz_permutex2var_ps (__mmask16 __U, __m512 __A, __m512i __I,
(__mmask16) __U);
}
static __inline__ __mmask16 __DEFAULT_FN_ATTRS
_mm512_testn_epi32_mask (__m512i __A, __m512i __B)
{
return (__mmask16) __builtin_ia32_ptestnmd512 ((__v16si) __A,
(__v16si) __B,
(__mmask16) -1);
}
static __inline__ __mmask16 __DEFAULT_FN_ATTRS
_mm512_mask_testn_epi32_mask (__mmask16 __U, __m512i __A, __m512i __B)
{
return (__mmask16) __builtin_ia32_ptestnmd512 ((__v16si) __A,
(__v16si) __B, __U);
}
static __inline__ __mmask8 __DEFAULT_FN_ATTRS
_mm512_testn_epi64_mask (__m512i __A, __m512i __B)
{
return (__mmask8) __builtin_ia32_ptestnmq512 ((__v8di) __A,
(__v8di) __B,
(__mmask8) -1);
}
static __inline__ __mmask8 __DEFAULT_FN_ATTRS
_mm512_mask_testn_epi64_mask (__mmask8 __U, __m512i __A, __m512i __B)
{
return (__mmask8) __builtin_ia32_ptestnmq512 ((__v8di) __A,
(__v8di) __B, __U);
}
#define _mm512_cvtt_roundpd_epu32(A, R) __extension__ ({ \
(__m256i)__builtin_ia32_cvttpd2udq512_mask((__v8df)(__m512d)(A), \
@@ -7200,76 +6946,100 @@ _mm512_maskz_srai_epi64(__mmask8 __U, __m512i __A, int __B)
}
#define _mm512_shuffle_f32x4(A, B, imm) __extension__ ({ \
(__m512)__builtin_ia32_shuf_f32x4_mask((__v16sf)(__m512)(A), \
(__v16sf)(__m512)(B), (int)(imm), \
(__v16sf)_mm512_undefined_ps(), \
(__mmask16)-1); })
(__m512)__builtin_shufflevector((__v16sf)(__m512)(A), \
(__v16sf)(__m512)(B), \
0 + ((((imm) >> 0) & 0x3) * 4), \
1 + ((((imm) >> 0) & 0x3) * 4), \
2 + ((((imm) >> 0) & 0x3) * 4), \
3 + ((((imm) >> 0) & 0x3) * 4), \
0 + ((((imm) >> 2) & 0x3) * 4), \
1 + ((((imm) >> 2) & 0x3) * 4), \
2 + ((((imm) >> 2) & 0x3) * 4), \
3 + ((((imm) >> 2) & 0x3) * 4), \
16 + ((((imm) >> 4) & 0x3) * 4), \
17 + ((((imm) >> 4) & 0x3) * 4), \
18 + ((((imm) >> 4) & 0x3) * 4), \
19 + ((((imm) >> 4) & 0x3) * 4), \
16 + ((((imm) >> 6) & 0x3) * 4), \
17 + ((((imm) >> 6) & 0x3) * 4), \
18 + ((((imm) >> 6) & 0x3) * 4), \
19 + ((((imm) >> 6) & 0x3) * 4)); })
#define _mm512_mask_shuffle_f32x4(W, U, A, B, imm) __extension__ ({ \
(__m512)__builtin_ia32_shuf_f32x4_mask((__v16sf)(__m512)(A), \
(__v16sf)(__m512)(B), (int)(imm), \
(__v16sf)(__m512)(W), \
(__mmask16)(U)); })
(__m512)__builtin_ia32_selectps_512((__mmask16)(U), \
(__v16sf)_mm512_shuffle_f32x4((A), (B), (imm)), \
(__v16sf)(__m512)(W)); })
#define _mm512_maskz_shuffle_f32x4(U, A, B, imm) __extension__ ({ \
(__m512)__builtin_ia32_shuf_f32x4_mask((__v16sf)(__m512)(A), \
(__v16sf)(__m512)(B), (int)(imm), \
(__v16sf)_mm512_setzero_ps(), \
(__mmask16)(U)); })
(__m512)__builtin_ia32_selectps_512((__mmask16)(U), \
(__v16sf)_mm512_shuffle_f32x4((A), (B), (imm)), \
(__v16sf)_mm512_setzero_ps()); })
#define _mm512_shuffle_f64x2(A, B, imm) __extension__ ({ \
(__m512d)__builtin_ia32_shuf_f64x2_mask((__v8df)(__m512d)(A), \
(__v8df)(__m512d)(B), (int)(imm), \
(__v8df)_mm512_undefined_pd(), \
(__mmask8)-1); })
(__m512d)__builtin_shufflevector((__v8df)(__m512d)(A), \
(__v8df)(__m512d)(B), \
0 + ((((imm) >> 0) & 0x3) * 2), \
1 + ((((imm) >> 0) & 0x3) * 2), \
0 + ((((imm) >> 2) & 0x3) * 2), \
1 + ((((imm) >> 2) & 0x3) * 2), \
8 + ((((imm) >> 4) & 0x3) * 2), \
9 + ((((imm) >> 4) & 0x3) * 2), \
8 + ((((imm) >> 6) & 0x3) * 2), \
9 + ((((imm) >> 6) & 0x3) * 2)); })
#define _mm512_mask_shuffle_f64x2(W, U, A, B, imm) __extension__ ({ \
(__m512d)__builtin_ia32_shuf_f64x2_mask((__v8df)(__m512d)(A), \
(__v8df)(__m512d)(B), (int)(imm), \
(__v8df)(__m512d)(W), \
(__mmask8)(U)); })
(__m512d)__builtin_ia32_selectpd_512((__mmask8)(U), \
(__v8df)_mm512_shuffle_f64x2((A), (B), (imm)), \
(__v8df)(__m512d)(W)); })
#define _mm512_maskz_shuffle_f64x2(U, A, B, imm) __extension__ ({ \
(__m512d)__builtin_ia32_shuf_f64x2_mask((__v8df)(__m512d)(A), \
(__v8df)(__m512d)(B), (int)(imm), \
(__v8df)_mm512_setzero_pd(), \
(__mmask8)(U)); })
(__m512d)__builtin_ia32_selectpd_512((__mmask8)(U), \
(__v8df)_mm512_shuffle_f64x2((A), (B), (imm)), \
(__v8df)_mm512_setzero_pd()); })
#define _mm512_shuffle_i32x4(A, B, imm) __extension__ ({ \
(__m512i)__builtin_ia32_shuf_i32x4_mask((__v16si)(__m512i)(A), \
(__v16si)(__m512i)(B), (int)(imm), \
(__v16si)_mm512_setzero_si512(), \
(__mmask16)-1); })
(__m512i)__builtin_shufflevector((__v8di)(__m512i)(A), \
(__v8di)(__m512i)(B), \
0 + ((((imm) >> 0) & 0x3) * 2), \
1 + ((((imm) >> 0) & 0x3) * 2), \
0 + ((((imm) >> 2) & 0x3) * 2), \
1 + ((((imm) >> 2) & 0x3) * 2), \
8 + ((((imm) >> 4) & 0x3) * 2), \
9 + ((((imm) >> 4) & 0x3) * 2), \
8 + ((((imm) >> 6) & 0x3) * 2), \
9 + ((((imm) >> 6) & 0x3) * 2)); })
#define _mm512_mask_shuffle_i32x4(W, U, A, B, imm) __extension__ ({ \
(__m512i)__builtin_ia32_shuf_i32x4_mask((__v16si)(__m512i)(A), \
(__v16si)(__m512i)(B), (int)(imm), \
(__v16si)(__m512i)(W), \
(__mmask16)(U)); })
(__m512i)__builtin_ia32_selectd_512((__mmask16)(U), \
(__v16si)_mm512_shuffle_i32x4((A), (B), (imm)), \
(__v16si)(__m512i)(W)); })
#define _mm512_maskz_shuffle_i32x4(U, A, B, imm) __extension__ ({ \
(__m512i)__builtin_ia32_shuf_i32x4_mask((__v16si)(__m512i)(A), \
(__v16si)(__m512i)(B), (int)(imm), \
(__v16si)_mm512_setzero_si512(), \
(__mmask16)(U)); })
(__m512i)__builtin_ia32_selectd_512((__mmask16)(U), \
(__v16si)_mm512_shuffle_i32x4((A), (B), (imm)), \
(__v16si)_mm512_setzero_si512()); })
#define _mm512_shuffle_i64x2(A, B, imm) __extension__ ({ \
(__m512i)__builtin_ia32_shuf_i64x2_mask((__v8di)(__m512i)(A), \
(__v8di)(__m512i)(B), (int)(imm), \
(__v8di)_mm512_setzero_si512(), \
(__mmask8)-1); })
(__m512i)__builtin_shufflevector((__v8di)(__m512i)(A), \
(__v8di)(__m512i)(B), \
0 + ((((imm) >> 0) & 0x3) * 2), \
1 + ((((imm) >> 0) & 0x3) * 2), \
0 + ((((imm) >> 2) & 0x3) * 2), \
1 + ((((imm) >> 2) & 0x3) * 2), \
8 + ((((imm) >> 4) & 0x3) * 2), \
9 + ((((imm) >> 4) & 0x3) * 2), \
8 + ((((imm) >> 6) & 0x3) * 2), \
9 + ((((imm) >> 6) & 0x3) * 2)); })
#define _mm512_mask_shuffle_i64x2(W, U, A, B, imm) __extension__ ({ \
(__m512i)__builtin_ia32_shuf_i64x2_mask((__v8di)(__m512i)(A), \
(__v8di)(__m512i)(B), (int)(imm), \
(__v8di)(__m512i)(W), \
(__mmask8)(U)); })
(__m512i)__builtin_ia32_selectq_512((__mmask8)(U), \
(__v8di)_mm512_shuffle_i64x2((A), (B), (imm)), \
(__v8di)(__m512i)(W)); })
#define _mm512_maskz_shuffle_i64x2(U, A, B, imm) __extension__ ({ \
(__m512i)__builtin_ia32_shuf_i64x2_mask((__v8di)(__m512i)(A), \
(__v8di)(__m512i)(B), (int)(imm), \
(__v8di)_mm512_setzero_si512(), \
(__mmask8)(U)); })
(__m512i)__builtin_ia32_selectq_512((__mmask8)(U), \
(__v8di)_mm512_shuffle_i64x2((A), (B), (imm)), \
(__v8di)_mm512_setzero_si512()); })
#define _mm512_shuffle_pd(A, B, M) __extension__ ({ \
(__m512d)__builtin_shufflevector((__v8df)(__m512d)(A), \
@@ -9017,7 +8787,7 @@ _mm512_kortestz (__mmask16 __A, __mmask16 __B)
static __inline__ __mmask16 __DEFAULT_FN_ATTRS
_mm512_kunpackb (__mmask16 __A, __mmask16 __B)
{
return (__mmask16) __builtin_ia32_kunpckhi ((__mmask16) __A, (__mmask16) __B);
return (__mmask16) (( __A & 0xFF) | ( __B << 8));
}
static __inline__ __mmask16 __DEFAULT_FN_ATTRS
@@ -9040,7 +8810,7 @@ _mm512_stream_si512 (__m512i * __P, __m512i __A)
}
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_stream_load_si512 (void *__P)
_mm512_stream_load_si512 (void const *__P)
{
typedef __v8di __v8di_aligned __attribute__((aligned(64)));
return (__m512i) __builtin_nontemporal_load((const __v8di_aligned *)__P);
@@ -9172,6 +8942,64 @@ _mm512_maskz_compress_epi32 (__mmask16 __U, __m512i __A)
(__mmask8)(M), \
_MM_FROUND_CUR_DIRECTION); })
/* Bit Test */
static __inline __mmask16 __DEFAULT_FN_ATTRS
_mm512_test_epi32_mask (__m512i __A, __m512i __B)
{
return _mm512_cmpneq_epi32_mask (_mm512_and_epi32(__A, __B),
_mm512_setzero_epi32());
}
static __inline__ __mmask16 __DEFAULT_FN_ATTRS
_mm512_mask_test_epi32_mask (__mmask16 __U, __m512i __A, __m512i __B)
{
return _mm512_mask_cmpneq_epi32_mask (__U, _mm512_and_epi32 (__A, __B),
_mm512_setzero_epi32());
}
static __inline __mmask8 __DEFAULT_FN_ATTRS
_mm512_test_epi64_mask (__m512i __A, __m512i __B)
{
return _mm512_cmpneq_epi64_mask (_mm512_and_epi32 (__A, __B),
_mm512_setzero_epi32());
}
static __inline__ __mmask8 __DEFAULT_FN_ATTRS
_mm512_mask_test_epi64_mask (__mmask8 __U, __m512i __A, __m512i __B)
{
return _mm512_mask_cmpneq_epi64_mask (__U, _mm512_and_epi32 (__A, __B),
_mm512_setzero_epi32());
}
static __inline__ __mmask16 __DEFAULT_FN_ATTRS
_mm512_testn_epi32_mask (__m512i __A, __m512i __B)
{
return _mm512_cmpeq_epi32_mask (_mm512_and_epi32 (__A, __B),
_mm512_setzero_epi32());
}
static __inline__ __mmask16 __DEFAULT_FN_ATTRS
_mm512_mask_testn_epi32_mask (__mmask16 __U, __m512i __A, __m512i __B)
{
return _mm512_mask_cmpeq_epi32_mask (__U, _mm512_and_epi32 (__A, __B),
_mm512_setzero_epi32());
}
static __inline__ __mmask8 __DEFAULT_FN_ATTRS
_mm512_testn_epi64_mask (__m512i __A, __m512i __B)
{
return _mm512_cmpeq_epi64_mask (_mm512_and_epi32 (__A, __B),
_mm512_setzero_epi32());
}
static __inline__ __mmask8 __DEFAULT_FN_ATTRS
_mm512_mask_testn_epi64_mask (__mmask8 __U, __m512i __A, __m512i __B)
{
return _mm512_mask_cmpeq_epi64_mask (__U, _mm512_and_epi32 (__A, __B),
_mm512_setzero_epi32());
}
static __inline__ __m512 __DEFAULT_FN_ATTRS
_mm512_movehdup_ps (__m512 __A)
{
@@ -9742,16 +9570,18 @@ _mm_cvtu64_ss (__m128 __A, unsigned long long __B)
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_mask_set1_epi32 (__m512i __O, __mmask16 __M, int __A)
{
return (__m512i) __builtin_ia32_pbroadcastd512_gpr_mask (__A, (__v16si) __O,
__M);
return (__m512i) __builtin_ia32_selectd_512(__M,
(__v16si) _mm512_set1_epi32(__A),
(__v16si) __O);
}
#ifdef __x86_64__
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_mask_set1_epi64 (__m512i __O, __mmask8 __M, long long __A)
{
return (__m512i) __builtin_ia32_pbroadcastq512_gpr_mask (__A, (__v8di) __O,
__M);
return (__m512i) __builtin_ia32_selectq_512(__M,
(__v8di) _mm512_set1_epi64(__A),
(__v8di) __O);
}
#endif

View File

@@ -0,0 +1,391 @@
/*===------------- avx512vbmi2intrin.h - VBMI2 intrinsics ------------------===
*
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
*
*===-----------------------------------------------------------------------===
*/
#ifndef __IMMINTRIN_H
#error "Never use <avx512vbmi2intrin.h> directly; include <immintrin.h> instead."
#endif
#ifndef __AVX512VBMI2INTRIN_H
#define __AVX512VBMI2INTRIN_H
/* Define the default attributes for the functions in this file. */
#define __DEFAULT_FN_ATTRS __attribute__((__always_inline__, __nodebug__, __target__("avx512vbmi2")))
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_mask_compress_epi16(__m512i __S, __mmask32 __U, __m512i __D)
{
return (__m512i) __builtin_ia32_compresshi512_mask ((__v32hi) __D,
(__v32hi) __S,
__U);
}
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_maskz_compress_epi16(__mmask32 __U, __m512i __D)
{
return (__m512i) __builtin_ia32_compresshi512_mask ((__v32hi) __D,
(__v32hi) _mm512_setzero_hi(),
__U);
}
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_mask_compress_epi8(__m512i __S, __mmask64 __U, __m512i __D)
{
return (__m512i) __builtin_ia32_compressqi512_mask ((__v64qi) __D,
(__v64qi) __S,
__U);
}
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_maskz_compress_epi8(__mmask64 __U, __m512i __D)
{
return (__m512i) __builtin_ia32_compressqi512_mask ((__v64qi) __D,
(__v64qi) _mm512_setzero_qi(),
__U);
}
static __inline__ void __DEFAULT_FN_ATTRS
_mm512_mask_compressstoreu_epi16(void *__P, __mmask32 __U, __m512i __D)
{
__builtin_ia32_compressstorehi512_mask ((__v32hi *) __P, (__v32hi) __D,
__U);
}
static __inline__ void __DEFAULT_FN_ATTRS
_mm512_mask_compressstoreu_epi8(void *__P, __mmask64 __U, __m512i __D)
{
__builtin_ia32_compressstoreqi512_mask ((__v64qi *) __P, (__v64qi) __D,
__U);
}
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_mask_expand_epi16(__m512i __S, __mmask32 __U, __m512i __D)
{
return (__m512i) __builtin_ia32_expandhi512_mask ((__v32hi) __D,
(__v32hi) __S,
__U);
}
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_maskz_expand_epi16(__mmask32 __U, __m512i __D)
{
return (__m512i) __builtin_ia32_expandhi512_mask ((__v32hi) __D,
(__v32hi) _mm512_setzero_hi(),
__U);
}
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_mask_expand_epi8(__m512i __S, __mmask64 __U, __m512i __D)
{
return (__m512i) __builtin_ia32_expandqi512_mask ((__v64qi) __D,
(__v64qi) __S,
__U);
}
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_maskz_expand_epi8(__mmask64 __U, __m512i __D)
{
return (__m512i) __builtin_ia32_expandqi512_mask ((__v64qi) __D,
(__v64qi) _mm512_setzero_qi(),
__U);
}
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_mask_expandloadu_epi16(__m512i __S, __mmask32 __U, void const *__P)
{
return (__m512i) __builtin_ia32_expandloadhi512_mask ((const __v32hi *)__P,
(__v32hi) __S,
__U);
}
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_maskz_expandloadu_epi16(__mmask32 __U, void const *__P)
{
return (__m512i) __builtin_ia32_expandloadhi512_mask ((const __v32hi *)__P,
(__v32hi) _mm512_setzero_hi(),
__U);
}
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_mask_expandloadu_epi8(__m512i __S, __mmask64 __U, void const *__P)
{
return (__m512i) __builtin_ia32_expandloadqi512_mask ((const __v64qi *)__P,
(__v64qi) __S,
__U);
}
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_maskz_expandloadu_epi8(__mmask64 __U, void const *__P)
{
return (__m512i) __builtin_ia32_expandloadqi512_mask ((const __v64qi *)__P,
(__v64qi) _mm512_setzero_qi(),
__U);
}
#define _mm512_mask_shldi_epi64(S, U, A, B, I) __extension__ ({ \
(__m512i)__builtin_ia32_vpshldq512_mask((__v8di)(A), \
(__v8di)(B), \
(int)(I), \
(__v8di)(S), \
(__mmask8)(U)); })
#define _mm512_maskz_shldi_epi64(U, A, B, I) \
_mm512_mask_shldi_epi64(_mm512_setzero_hi(), (U), (A), (B), (I))
#define _mm512_shldi_epi64(A, B, I) \
_mm512_mask_shldi_epi64(_mm512_undefined(), (__mmask8)(-1), (A), (B), (I))
#define _mm512_mask_shldi_epi32(S, U, A, B, I) __extension__ ({ \
(__m512i)__builtin_ia32_vpshldd512_mask((__v16si)(A), \
(__v16si)(B), \
(int)(I), \
(__v16si)(S), \
(__mmask16)(U)); })
#define _mm512_maskz_shldi_epi32(U, A, B, I) \
_mm512_mask_shldi_epi32(_mm512_setzero_hi(), (U), (A), (B), (I))
#define _mm512_shldi_epi32(A, B, I) \
_mm512_mask_shldi_epi32(_mm512_undefined(), (__mmask16)(-1), (A), (B), (I))
#define _mm512_mask_shldi_epi16(S, U, A, B, I) __extension__ ({ \
(__m512i)__builtin_ia32_vpshldw512_mask((__v32hi)(A), \
(__v32hi)(B), \
(int)(I), \
(__v32hi)(S), \
(__mmask32)(U)); })
#define _mm512_maskz_shldi_epi16(U, A, B, I) \
_mm512_mask_shldi_epi16(_mm512_setzero_hi(), (U), (A), (B), (I))
#define _mm512_shldi_epi16(A, B, I) \
_mm512_mask_shldi_epi16(_mm512_undefined(), (__mmask32)(-1), (A), (B), (I))
#define _mm512_mask_shrdi_epi64(S, U, A, B, I) __extension__ ({ \
(__m512i)__builtin_ia32_vpshrdq512_mask((__v8di)(A), \
(__v8di)(B), \
(int)(I), \
(__v8di)(S), \
(__mmask8)(U)); })
#define _mm512_maskz_shrdi_epi64(U, A, B, I) \
_mm512_mask_shrdi_epi64(_mm512_setzero_hi(), (U), (A), (B), (I))
#define _mm512_shrdi_epi64(A, B, I) \
_mm512_mask_shrdi_epi64(_mm512_undefined(), (__mmask8)(-1), (A), (B), (I))
#define _mm512_mask_shrdi_epi32(S, U, A, B, I) __extension__ ({ \
(__m512i)__builtin_ia32_vpshrdd512_mask((__v16si)(A), \
(__v16si)(B), \
(int)(I), \
(__v16si)(S), \
(__mmask16)(U)); })
#define _mm512_maskz_shrdi_epi32(U, A, B, I) \
_mm512_mask_shrdi_epi32(_mm512_setzero_hi(), (U), (A), (B), (I))
#define _mm512_shrdi_epi32(A, B, I) \
_mm512_mask_shrdi_epi32(_mm512_undefined(), (__mmask16)(-1), (A), (B), (I))
#define _mm512_mask_shrdi_epi16(S, U, A, B, I) __extension__ ({ \
(__m512i)__builtin_ia32_vpshrdw512_mask((__v32hi)(A), \
(__v32hi)(B), \
(int)(I), \
(__v32hi)(S), \
(__mmask32)(U)); })
#define _mm512_maskz_shrdi_epi16(U, A, B, I) \
_mm512_mask_shrdi_epi16(_mm512_setzero_hi(), (U), (A), (B), (I))
#define _mm512_shrdi_epi16(A, B, I) \
_mm512_mask_shrdi_epi16(_mm512_undefined(), (__mmask32)(-1), (A), (B), (I))
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_mask_shldv_epi64(__m512i __S, __mmask8 __U, __m512i __A, __m512i __B)
{
return (__m512i) __builtin_ia32_vpshldvq512_mask ((__v8di) __S,
(__v8di) __A,
(__v8di) __B,
__U);
}
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_maskz_shldv_epi64(__mmask8 __U, __m512i __S, __m512i __A, __m512i __B)
{
return (__m512i) __builtin_ia32_vpshldvq512_maskz ((__v8di) __S,
(__v8di) __A,
(__v8di) __B,
__U);
}
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_shldv_epi64(__m512i __S, __m512i __A, __m512i __B)
{
return (__m512i) __builtin_ia32_vpshldvq512_mask ((__v8di) __S,
(__v8di) __A,
(__v8di) __B,
(__mmask8) -1);
}
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_mask_shldv_epi32(__m512i __S, __mmask16 __U, __m512i __A, __m512i __B)
{
return (__m512i) __builtin_ia32_vpshldvd512_mask ((__v16si) __S,
(__v16si) __A,
(__v16si) __B,
__U);
}
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_maskz_shldv_epi32(__mmask16 __U, __m512i __S, __m512i __A, __m512i __B)
{
return (__m512i) __builtin_ia32_vpshldvd512_maskz ((__v16si) __S,
(__v16si) __A,
(__v16si) __B,
__U);
}
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_shldv_epi32(__m512i __S, __m512i __A, __m512i __B)
{
return (__m512i) __builtin_ia32_vpshldvd512_mask ((__v16si) __S,
(__v16si) __A,
(__v16si) __B,
(__mmask16) -1);
}
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_mask_shldv_epi16(__m512i __S, __mmask32 __U, __m512i __A, __m512i __B)
{
return (__m512i) __builtin_ia32_vpshldvw512_mask ((__v32hi) __S,
(__v32hi) __A,
(__v32hi) __B,
__U);
}
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_maskz_shldv_epi16(__mmask32 __U, __m512i __S, __m512i __A, __m512i __B)
{
return (__m512i) __builtin_ia32_vpshldvw512_maskz ((__v32hi) __S,
(__v32hi) __A,
(__v32hi) __B,
__U);
}
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_shldv_epi16(__m512i __S, __m512i __A, __m512i __B)
{
return (__m512i) __builtin_ia32_vpshldvw512_mask ((__v32hi) __S,
(__v32hi) __A,
(__v32hi) __B,
(__mmask32) -1);
}
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_mask_shrdv_epi64(__m512i __S, __mmask8 __U, __m512i __A, __m512i __B)
{
return (__m512i) __builtin_ia32_vpshrdvq512_mask ((__v8di) __S,
(__v8di) __A,
(__v8di) __B,
__U);
}
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_maskz_shrdv_epi64(__mmask8 __U, __m512i __S, __m512i __A, __m512i __B)
{
return (__m512i) __builtin_ia32_vpshrdvq512_maskz ((__v8di) __S,
(__v8di) __A,
(__v8di) __B,
__U);
}
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_shrdv_epi64(__m512i __S, __m512i __A, __m512i __B)
{
return (__m512i) __builtin_ia32_vpshrdvq512_mask ((__v8di) __S,
(__v8di) __A,
(__v8di) __B,
(__mmask8) -1);
}
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_mask_shrdv_epi32(__m512i __S, __mmask16 __U, __m512i __A, __m512i __B)
{
return (__m512i) __builtin_ia32_vpshrdvd512_mask ((__v16si) __S,
(__v16si) __A,
(__v16si) __B,
__U);
}
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_maskz_shrdv_epi32(__mmask16 __U, __m512i __S, __m512i __A, __m512i __B)
{
return (__m512i) __builtin_ia32_vpshrdvd512_maskz ((__v16si) __S,
(__v16si) __A,
(__v16si) __B,
__U);
}
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_shrdv_epi32(__m512i __S, __m512i __A, __m512i __B)
{
return (__m512i) __builtin_ia32_vpshrdvd512_mask ((__v16si) __S,
(__v16si) __A,
(__v16si) __B,
(__mmask16) -1);
}
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_mask_shrdv_epi16(__m512i __S, __mmask32 __U, __m512i __A, __m512i __B)
{
return (__m512i) __builtin_ia32_vpshrdvw512_mask ((__v32hi) __S,
(__v32hi) __A,
(__v32hi) __B,
__U);
}
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_maskz_shrdv_epi16(__mmask32 __U, __m512i __S, __m512i __A, __m512i __B)
{
return (__m512i) __builtin_ia32_vpshrdvw512_maskz ((__v32hi) __S,
(__v32hi) __A,
(__v32hi) __B,
__U);
}
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_shrdv_epi16(__m512i __S, __m512i __A, __m512i __B)
{
return (__m512i) __builtin_ia32_vpshrdvw512_mask ((__v32hi) __S,
(__v32hi) __A,
(__v32hi) __B,
(__mmask32) -1);
}
#undef __DEFAULT_FN_ATTRS
#endif

View File

@@ -0,0 +1,157 @@
/*===------------- avx512vlbitalgintrin.h - BITALG intrinsics ------------------===
*
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
*
*===-----------------------------------------------------------------------===
*/
#ifndef __IMMINTRIN_H
#error "Never use <avx512vlbitalgintrin.h> directly; include <immintrin.h> instead."
#endif
#ifndef __AVX512VLBITALGINTRIN_H
#define __AVX512VLBITALGINTRIN_H
/* Define the default attributes for the functions in this file. */
#define __DEFAULT_FN_ATTRS __attribute__((__always_inline__, __nodebug__, __target__("avx512vl,avx512bitalg")))
static __inline__ __m256i __DEFAULT_FN_ATTRS
_mm256_popcnt_epi16(__m256i __A)
{
return (__m256i) __builtin_ia32_vpopcntw_256((__v16hi) __A);
}
static __inline__ __m256i __DEFAULT_FN_ATTRS
_mm256_mask_popcnt_epi16(__m256i __A, __mmask16 __U, __m256i __B)
{
return (__m256i) __builtin_ia32_selectw_256((__mmask16) __U,
(__v16hi) _mm256_popcnt_epi16(__B),
(__v16hi) __A);
}
static __inline__ __m256i __DEFAULT_FN_ATTRS
_mm256_maskz_popcnt_epi16(__mmask16 __U, __m256i __B)
{
return _mm256_mask_popcnt_epi16((__m256i) _mm256_setzero_si256(),
__U,
__B);
}
static __inline__ __m128i __DEFAULT_FN_ATTRS
_mm128_popcnt_epi16(__m128i __A)
{
return (__m128i) __builtin_ia32_vpopcntw_128((__v8hi) __A);
}
static __inline__ __m128i __DEFAULT_FN_ATTRS
_mm128_mask_popcnt_epi16(__m128i __A, __mmask8 __U, __m128i __B)
{
return (__m128i) __builtin_ia32_selectw_128((__mmask8) __U,
(__v8hi) _mm128_popcnt_epi16(__B),
(__v8hi) __A);
}
static __inline__ __m128i __DEFAULT_FN_ATTRS
_mm128_maskz_popcnt_epi16(__mmask8 __U, __m128i __B)
{
return _mm128_mask_popcnt_epi16((__m128i) _mm_setzero_si128(),
__U,
__B);
}
static __inline__ __m256i __DEFAULT_FN_ATTRS
_mm256_popcnt_epi8(__m256i __A)
{
return (__m256i) __builtin_ia32_vpopcntb_256((__v32qi) __A);
}
static __inline__ __m256i __DEFAULT_FN_ATTRS
_mm256_mask_popcnt_epi8(__m256i __A, __mmask32 __U, __m256i __B)
{
return (__m256i) __builtin_ia32_selectb_256((__mmask32) __U,
(__v32qi) _mm256_popcnt_epi8(__B),
(__v32qi) __A);
}
static __inline__ __m256i __DEFAULT_FN_ATTRS
_mm256_maskz_popcnt_epi8(__mmask32 __U, __m256i __B)
{
return _mm256_mask_popcnt_epi8((__m256i) _mm256_setzero_si256(),
__U,
__B);
}
static __inline__ __m128i __DEFAULT_FN_ATTRS
_mm128_popcnt_epi8(__m128i __A)
{
return (__m128i) __builtin_ia32_vpopcntb_128((__v16qi) __A);
}
static __inline__ __m128i __DEFAULT_FN_ATTRS
_mm128_mask_popcnt_epi8(__m128i __A, __mmask16 __U, __m128i __B)
{
return (__m128i) __builtin_ia32_selectb_128((__mmask16) __U,
(__v16qi) _mm128_popcnt_epi8(__B),
(__v16qi) __A);
}
static __inline__ __m128i __DEFAULT_FN_ATTRS
_mm128_maskz_popcnt_epi8(__mmask16 __U, __m128i __B)
{
return _mm128_mask_popcnt_epi8((__m128i) _mm_setzero_si128(),
__U,
__B);
}
static __inline__ __mmask32 __DEFAULT_FN_ATTRS
_mm256_mask_bitshuffle_epi32_mask(__mmask32 __U, __m256i __A, __m256i __B)
{
return (__mmask32) __builtin_ia32_vpshufbitqmb256_mask((__v32qi) __A,
(__v32qi) __B,
__U);
}
static __inline__ __mmask32 __DEFAULT_FN_ATTRS
_mm256_bitshuffle_epi32_mask(__m256i __A, __m256i __B)
{
return _mm256_mask_bitshuffle_epi32_mask((__mmask32) -1,
__A,
__B);
}
static __inline__ __mmask16 __DEFAULT_FN_ATTRS
_mm128_mask_bitshuffle_epi16_mask(__mmask16 __U, __m128i __A, __m128i __B)
{
return (__mmask16) __builtin_ia32_vpshufbitqmb128_mask((__v16qi) __A,
(__v16qi) __B,
__U);
}
static __inline__ __mmask16 __DEFAULT_FN_ATTRS
_mm128_bitshuffle_epi16_mask(__m128i __A, __m128i __B)
{
return _mm128_mask_bitshuffle_epi16_mask((__mmask16) -1,
__A,
__B);
}
#undef __DEFAULT_FN_ATTRS
#endif

File diff suppressed because it is too large Load Diff

View File

@@ -33,26 +33,26 @@
static __inline__ __m128i __DEFAULT_FN_ATTRS
_mm_broadcastmb_epi64 (__mmask8 __A)
{
return (__m128i) __builtin_ia32_broadcastmb128 (__A);
{
return (__m128i) _mm_set1_epi64x((long long) __A);
}
static __inline__ __m256i __DEFAULT_FN_ATTRS
_mm256_broadcastmb_epi64 (__mmask8 __A)
{
return (__m256i) __builtin_ia32_broadcastmb256 (__A);
return (__m256i) _mm256_set1_epi64x((long long)__A);
}
static __inline__ __m128i __DEFAULT_FN_ATTRS
_mm_broadcastmw_epi32 (__mmask16 __A)
{
return (__m128i) __builtin_ia32_broadcastmw128 (__A);
return (__m128i) _mm_set1_epi32((int)__A);
}
static __inline__ __m256i __DEFAULT_FN_ATTRS
_mm256_broadcastmw_epi32 (__mmask16 __A)
{
return (__m256i) __builtin_ia32_broadcastmw256 (__A);
return (__m256i) _mm256_set1_epi32((int)__A);
}

View File

@@ -978,25 +978,25 @@ _mm256_movepi64_mask (__m256i __A)
static __inline__ __m256 __DEFAULT_FN_ATTRS
_mm256_broadcast_f32x2 (__m128 __A)
{
return (__m256) __builtin_ia32_broadcastf32x2_256_mask ((__v4sf) __A,
(__v8sf)_mm256_undefined_ps(),
(__mmask8) -1);
return (__m256)__builtin_shufflevector((__v4sf)__A,
(__v4sf)_mm_undefined_ps(),
0, 1, 0, 1, 0, 1, 0, 1);
}
static __inline__ __m256 __DEFAULT_FN_ATTRS
_mm256_mask_broadcast_f32x2 (__m256 __O, __mmask8 __M, __m128 __A)
{
return (__m256) __builtin_ia32_broadcastf32x2_256_mask ((__v4sf) __A,
(__v8sf) __O,
__M);
return (__m256)__builtin_ia32_selectps_256((__mmask8)__M,
(__v8sf)_mm256_broadcast_f32x2(__A),
(__v8sf)__O);
}
static __inline__ __m256 __DEFAULT_FN_ATTRS
_mm256_maskz_broadcast_f32x2 (__mmask8 __M, __m128 __A)
{
return (__m256) __builtin_ia32_broadcastf32x2_256_mask ((__v4sf) __A,
(__v8sf) _mm256_setzero_ps (),
__M);
return (__m256)__builtin_ia32_selectps_256((__mmask8)__M,
(__v8sf)_mm256_broadcast_f32x2(__A),
(__v8sf)_mm256_setzero_ps());
}
static __inline__ __m256d __DEFAULT_FN_ATTRS
@@ -1025,49 +1025,49 @@ _mm256_maskz_broadcast_f64x2 (__mmask8 __M, __m128d __A)
static __inline__ __m128i __DEFAULT_FN_ATTRS
_mm_broadcast_i32x2 (__m128i __A)
{
return (__m128i) __builtin_ia32_broadcasti32x2_128_mask ((__v4si) __A,
(__v4si)_mm_undefined_si128(),
(__mmask8) -1);
return (__m128i)__builtin_shufflevector((__v4si)__A,
(__v4si)_mm_undefined_si128(),
0, 1, 0, 1);
}
static __inline__ __m128i __DEFAULT_FN_ATTRS
_mm_mask_broadcast_i32x2 (__m128i __O, __mmask8 __M, __m128i __A)
{
return (__m128i) __builtin_ia32_broadcasti32x2_128_mask ((__v4si) __A,
(__v4si) __O,
__M);
return (__m128i)__builtin_ia32_selectd_128((__mmask8)__M,
(__v4si)_mm_broadcast_i32x2(__A),
(__v4si)__O);
}
static __inline__ __m128i __DEFAULT_FN_ATTRS
_mm_maskz_broadcast_i32x2 (__mmask8 __M, __m128i __A)
{
return (__m128i) __builtin_ia32_broadcasti32x2_128_mask ((__v4si) __A,
(__v4si) _mm_setzero_si128 (),
__M);
return (__m128i)__builtin_ia32_selectd_128((__mmask8)__M,
(__v4si)_mm_broadcast_i32x2(__A),
(__v4si)_mm_setzero_si128());
}
static __inline__ __m256i __DEFAULT_FN_ATTRS
_mm256_broadcast_i32x2 (__m128i __A)
{
return (__m256i) __builtin_ia32_broadcasti32x2_256_mask ((__v4si) __A,
(__v8si)_mm256_undefined_si256(),
(__mmask8) -1);
return (__m256i)__builtin_shufflevector((__v4si)__A,
(__v4si)_mm_undefined_si128(),
0, 1, 0, 1, 0, 1, 0, 1);
}
static __inline__ __m256i __DEFAULT_FN_ATTRS
_mm256_mask_broadcast_i32x2 (__m256i __O, __mmask8 __M, __m128i __A)
{
return (__m256i) __builtin_ia32_broadcasti32x2_256_mask ((__v4si) __A,
(__v8si) __O,
__M);
return (__m256i)__builtin_ia32_selectd_256((__mmask8)__M,
(__v8si)_mm256_broadcast_i32x2(__A),
(__v8si)__O);
}
static __inline__ __m256i __DEFAULT_FN_ATTRS
_mm256_maskz_broadcast_i32x2 (__mmask8 __M, __m128i __A)
{
return (__m256i) __builtin_ia32_broadcasti32x2_256_mask ((__v4si) __A,
(__v8si) _mm256_setzero_si256 (),
__M);
return (__m256i)__builtin_ia32_selectd_256((__mmask8)__M,
(__v8si)_mm256_broadcast_i32x2(__A),
(__v8si)_mm256_setzero_si256());
}
static __inline__ __m256i __DEFAULT_FN_ATTRS

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,748 @@
/*===------------- avx512vlvbmi2intrin.h - VBMI2 intrinsics -----------------===
*
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
*
*===-----------------------------------------------------------------------===
*/
#ifndef __IMMINTRIN_H
#error "Never use <avx512vlvbmi2intrin.h> directly; include <immintrin.h> instead."
#endif
#ifndef __AVX512VLVBMI2INTRIN_H
#define __AVX512VLVBMI2INTRIN_H
/* Define the default attributes for the functions in this file. */
#define __DEFAULT_FN_ATTRS __attribute__((__always_inline__, __nodebug__, __target__("avx512vl,avx512vbmi2")))
static __inline __m128i __DEFAULT_FN_ATTRS
_mm128_setzero_hi(void) {
return (__m128i)(__v8hi){ 0, 0, 0, 0, 0, 0, 0, 0 };
}
static __inline__ __m128i __DEFAULT_FN_ATTRS
_mm128_mask_compress_epi16(__m128i __S, __mmask8 __U, __m128i __D)
{
return (__m128i) __builtin_ia32_compresshi128_mask ((__v8hi) __D,
(__v8hi) __S,
__U);
}
static __inline__ __m128i __DEFAULT_FN_ATTRS
_mm128_maskz_compress_epi16(__mmask8 __U, __m128i __D)
{
return (__m128i) __builtin_ia32_compresshi128_mask ((__v8hi) __D,
(__v8hi) _mm128_setzero_hi(),
__U);
}
static __inline__ __m128i __DEFAULT_FN_ATTRS
_mm128_mask_compress_epi8(__m128i __S, __mmask16 __U, __m128i __D)
{
return (__m128i) __builtin_ia32_compressqi128_mask ((__v16qi) __D,
(__v16qi) __S,
__U);
}
static __inline__ __m128i __DEFAULT_FN_ATTRS
_mm128_maskz_compress_epi8(__mmask16 __U, __m128i __D)
{
return (__m128i) __builtin_ia32_compressqi128_mask ((__v16qi) __D,
(__v16qi) _mm128_setzero_hi(),
__U);
}
static __inline__ void __DEFAULT_FN_ATTRS
_mm128_mask_compressstoreu_epi16(void *__P, __mmask8 __U, __m128i __D)
{
__builtin_ia32_compressstorehi128_mask ((__v8hi *) __P, (__v8hi) __D,
__U);
}
static __inline__ void __DEFAULT_FN_ATTRS
_mm128_mask_compressstoreu_epi8(void *__P, __mmask16 __U, __m128i __D)
{
__builtin_ia32_compressstoreqi128_mask ((__v16qi *) __P, (__v16qi) __D,
__U);
}
static __inline__ __m128i __DEFAULT_FN_ATTRS
_mm128_mask_expand_epi16(__m128i __S, __mmask8 __U, __m128i __D)
{
return (__m128i) __builtin_ia32_expandhi128_mask ((__v8hi) __D,
(__v8hi) __S,
__U);
}
static __inline__ __m128i __DEFAULT_FN_ATTRS
_mm128_maskz_expand_epi16(__mmask8 __U, __m128i __D)
{
return (__m128i) __builtin_ia32_expandhi128_mask ((__v8hi) __D,
(__v8hi) _mm128_setzero_hi(),
__U);
}
static __inline__ __m128i __DEFAULT_FN_ATTRS
_mm128_mask_expand_epi8(__m128i __S, __mmask16 __U, __m128i __D)
{
return (__m128i) __builtin_ia32_expandqi128_mask ((__v16qi) __D,
(__v16qi) __S,
__U);
}
static __inline__ __m128i __DEFAULT_FN_ATTRS
_mm128_maskz_expand_epi8(__mmask16 __U, __m128i __D)
{
return (__m128i) __builtin_ia32_expandqi128_mask ((__v16qi) __D,
(__v16qi) _mm128_setzero_hi(),
__U);
}
static __inline__ __m128i __DEFAULT_FN_ATTRS
_mm128_mask_expandloadu_epi16(__m128i __S, __mmask8 __U, void const *__P)
{
return (__m128i) __builtin_ia32_expandloadhi128_mask ((const __v8hi *)__P,
(__v8hi) __S,
__U);
}
static __inline__ __m128i __DEFAULT_FN_ATTRS
_mm128_maskz_expandloadu_epi16(__mmask8 __U, void const *__P)
{
return (__m128i) __builtin_ia32_expandloadhi128_mask ((const __v8hi *)__P,
(__v8hi) _mm128_setzero_hi(),
__U);
}
static __inline__ __m128i __DEFAULT_FN_ATTRS
_mm128_mask_expandloadu_epi8(__m128i __S, __mmask16 __U, void const *__P)
{
return (__m128i) __builtin_ia32_expandloadqi128_mask ((const __v16qi *)__P,
(__v16qi) __S,
__U);
}
static __inline__ __m128i __DEFAULT_FN_ATTRS
_mm128_maskz_expandloadu_epi8(__mmask16 __U, void const *__P)
{
return (__m128i) __builtin_ia32_expandloadqi128_mask ((const __v16qi *)__P,
(__v16qi) _mm128_setzero_hi(),
__U);
}
static __inline __m256i __DEFAULT_FN_ATTRS
_mm256_setzero_hi(void) {
return (__m256i)(__v16hi){ 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0 };
}
static __inline__ __m256i __DEFAULT_FN_ATTRS
_mm256_mask_compress_epi16(__m256i __S, __mmask16 __U, __m256i __D)
{
return (__m256i) __builtin_ia32_compresshi256_mask ((__v16hi) __D,
(__v16hi) __S,
__U);
}
static __inline__ __m256i __DEFAULT_FN_ATTRS
_mm256_maskz_compress_epi16(__mmask16 __U, __m256i __D)
{
return (__m256i) __builtin_ia32_compresshi256_mask ((__v16hi) __D,
(__v16hi) _mm256_setzero_hi(),
__U);
}
static __inline__ __m256i __DEFAULT_FN_ATTRS
_mm256_mask_compress_epi8(__m256i __S, __mmask32 __U, __m256i __D)
{
return (__m256i) __builtin_ia32_compressqi256_mask ((__v32qi) __D,
(__v32qi) __S,
__U);
}
static __inline__ __m256i __DEFAULT_FN_ATTRS
_mm256_maskz_compress_epi8(__mmask32 __U, __m256i __D)
{
return (__m256i) __builtin_ia32_compressqi256_mask ((__v32qi) __D,
(__v32qi) _mm256_setzero_hi(),
__U);
}
static __inline__ void __DEFAULT_FN_ATTRS
_mm256_mask_compressstoreu_epi16(void *__P, __mmask16 __U, __m256i __D)
{
__builtin_ia32_compressstorehi256_mask ((__v16hi *) __P, (__v16hi) __D,
__U);
}
static __inline__ void __DEFAULT_FN_ATTRS
_mm256_mask_compressstoreu_epi8(void *__P, __mmask32 __U, __m256i __D)
{
__builtin_ia32_compressstoreqi256_mask ((__v32qi *) __P, (__v32qi) __D,
__U);
}
static __inline__ __m256i __DEFAULT_FN_ATTRS
_mm256_mask_expand_epi16(__m256i __S, __mmask16 __U, __m256i __D)
{
return (__m256i) __builtin_ia32_expandhi256_mask ((__v16hi) __D,
(__v16hi) __S,
__U);
}
static __inline__ __m256i __DEFAULT_FN_ATTRS
_mm256_maskz_expand_epi16(__mmask16 __U, __m256i __D)
{
return (__m256i) __builtin_ia32_expandhi256_mask ((__v16hi) __D,
(__v16hi) _mm256_setzero_hi(),
__U);
}
static __inline__ __m256i __DEFAULT_FN_ATTRS
_mm256_mask_expand_epi8(__m256i __S, __mmask32 __U, __m256i __D)
{
return (__m256i) __builtin_ia32_expandqi256_mask ((__v32qi) __D,
(__v32qi) __S,
__U);
}
static __inline__ __m256i __DEFAULT_FN_ATTRS
_mm256_maskz_expand_epi8(__mmask32 __U, __m256i __D)
{
return (__m256i) __builtin_ia32_expandqi256_mask ((__v32qi) __D,
(__v32qi) _mm256_setzero_hi(),
__U);
}
static __inline__ __m256i __DEFAULT_FN_ATTRS
_mm256_mask_expandloadu_epi16(__m256i __S, __mmask16 __U, void const *__P)
{
return (__m256i) __builtin_ia32_expandloadhi256_mask ((const __v16hi *)__P,
(__v16hi) __S,
__U);
}
static __inline__ __m256i __DEFAULT_FN_ATTRS
_mm256_maskz_expandloadu_epi16(__mmask16 __U, void const *__P)
{
return (__m256i) __builtin_ia32_expandloadhi256_mask ((const __v16hi *)__P,
(__v16hi) _mm256_setzero_hi(),
__U);
}
static __inline__ __m256i __DEFAULT_FN_ATTRS
_mm256_mask_expandloadu_epi8(__m256i __S, __mmask32 __U, void const *__P)
{
return (__m256i) __builtin_ia32_expandloadqi256_mask ((const __v32qi *)__P,
(__v32qi) __S,
__U);
}
static __inline__ __m256i __DEFAULT_FN_ATTRS
_mm256_maskz_expandloadu_epi8(__mmask32 __U, void const *__P)
{
return (__m256i) __builtin_ia32_expandloadqi256_mask ((const __v32qi *)__P,
(__v32qi) _mm256_setzero_hi(),
__U);
}
#define _mm256_mask_shldi_epi64(S, U, A, B, I) __extension__ ({ \
(__m256i)__builtin_ia32_vpshldq256_mask((__v4di)(A), \
(__v4di)(B), \
(int)(I), \
(__v4di)(S), \
(__mmask8)(U)); })
#define _mm256_maskz_shldi_epi64(U, A, B, I) \
_mm256_mask_shldi_epi64(_mm256_setzero_hi(), (U), (A), (B), (I))
#define _mm256_shldi_epi64(A, B, I) \
_mm256_mask_shldi_epi64(_mm256_undefined_si256(), (__mmask8)(-1), (A), (B), (I))
#define _mm128_mask_shldi_epi64(S, U, A, B, I) __extension__ ({ \
(__m128i)__builtin_ia32_vpshldq128_mask((__v2di)(A), \
(__v2di)(B), \
(int)(I), \
(__v2di)(S), \
(__mmask8)(U)); })
#define _mm128_maskz_shldi_epi64(U, A, B, I) \
_mm128_mask_shldi_epi64(_mm128_setzero_hi(), (U), (A), (B), (I))
#define _mm128_shldi_epi64(A, B, I) \
_mm128_mask_shldi_epi64(_mm_undefined_si128(), (__mmask8)(-1), (A), (B), (I))
#define _mm256_mask_shldi_epi32(S, U, A, B, I) __extension__ ({ \
(__m256i)__builtin_ia32_vpshldd256_mask((__v8si)(A), \
(__v8si)(B), \
(int)(I), \
(__v8si)(S), \
(__mmask8)(U)); })
#define _mm256_maskz_shldi_epi32(U, A, B, I) \
_mm256_mask_shldi_epi32(_mm256_setzero_hi(), (U), (A), (B), (I))
#define _mm256_shldi_epi32(A, B, I) \
_mm256_mask_shldi_epi32(_mm256_undefined_si256(), (__mmask8)(-1), (A), (B), (I))
#define _mm128_mask_shldi_epi32(S, U, A, B, I) __extension__ ({ \
(__m128i)__builtin_ia32_vpshldd128_mask((__v4si)(A), \
(__v4si)(B), \
(int)(I), \
(__v4si)(S), \
(__mmask8)(U)); })
#define _mm128_maskz_shldi_epi32(U, A, B, I) \
_mm128_mask_shldi_epi32(_mm128_setzero_hi(), (U), (A), (B), (I))
#define _mm128_shldi_epi32(A, B, I) \
_mm128_mask_shldi_epi32(_mm_undefined_si128(), (__mmask8)(-1), (A), (B), (I))
#define _mm256_mask_shldi_epi16(S, U, A, B, I) __extension__ ({ \
(__m256i)__builtin_ia32_vpshldw256_mask((__v16hi)(A), \
(__v16hi)(B), \
(int)(I), \
(__v16hi)(S), \
(__mmask16)(U)); })
#define _mm256_maskz_shldi_epi16(U, A, B, I) \
_mm256_mask_shldi_epi16(_mm256_setzero_hi(), (U), (A), (B), (I))
#define _mm256_shldi_epi16(A, B, I) \
_mm256_mask_shldi_epi16(_mm256_undefined_si256(), (__mmask8)(-1), (A), (B), (I))
#define _mm128_mask_shldi_epi16(S, U, A, B, I) __extension__ ({ \
(__m128i)__builtin_ia32_vpshldw128_mask((__v8hi)(A), \
(__v8hi)(B), \
(int)(I), \
(__v8hi)(S), \
(__mmask8)(U)); })
#define _mm128_maskz_shldi_epi16(U, A, B, I) \
_mm128_mask_shldi_epi16(_mm128_setzero_hi(), (U), (A), (B), (I))
#define _mm128_shldi_epi16(A, B, I) \
_mm128_mask_shldi_epi16(_mm_undefined_si128(), (__mmask8)(-1), (A), (B), (I))
#define _mm256_mask_shrdi_epi64(S, U, A, B, I) __extension__ ({ \
(__m256i)__builtin_ia32_vpshrdq256_mask((__v4di)(A), \
(__v4di)(B), \
(int)(I), \
(__v4di)(S), \
(__mmask8)(U)); })
#define _mm256_maskz_shrdi_epi64(U, A, B, I) \
_mm256_mask_shrdi_epi64(_mm256_setzero_hi(), (U), (A), (B), (I))
#define _mm256_shrdi_epi64(A, B, I) \
_mm256_mask_shrdi_epi64(_mm256_undefined_si256(), (__mmask8)(-1), (A), (B), (I))
#define _mm128_mask_shrdi_epi64(S, U, A, B, I) __extension__ ({ \
(__m128i)__builtin_ia32_vpshrdq128_mask((__v2di)(A), \
(__v2di)(B), \
(int)(I), \
(__v2di)(S), \
(__mmask8)(U)); })
#define _mm128_maskz_shrdi_epi64(U, A, B, I) \
_mm128_mask_shrdi_epi64(_mm128_setzero_hi(), (U), (A), (B), (I))
#define _mm128_shrdi_epi64(A, B, I) \
_mm128_mask_shrdi_epi64(_mm_undefined_si128(), (__mmask8)(-1), (A), (B), (I))
#define _mm256_mask_shrdi_epi32(S, U, A, B, I) __extension__ ({ \
(__m256i)__builtin_ia32_vpshrdd256_mask((__v8si)(A), \
(__v8si)(B), \
(int)(I), \
(__v8si)(S), \
(__mmask8)(U)); })
#define _mm256_maskz_shrdi_epi32(U, A, B, I) \
_mm256_mask_shrdi_epi32(_mm256_setzero_hi(), (U), (A), (B), (I))
#define _mm256_shrdi_epi32(A, B, I) \
_mm256_mask_shrdi_epi32(_mm256_undefined_si256(), (__mmask8)(-1), (A), (B), (I))
#define _mm128_mask_shrdi_epi32(S, U, A, B, I) __extension__ ({ \
(__m128i)__builtin_ia32_vpshrdd128_mask((__v4si)(A), \
(__v4si)(B), \
(int)(I), \
(__v4si)(S), \
(__mmask8)(U)); })
#define _mm128_maskz_shrdi_epi32(U, A, B, I) \
_mm128_mask_shrdi_epi32(_mm128_setzero_hi(), (U), (A), (B), (I))
#define _mm128_shrdi_epi32(A, B, I) \
_mm128_mask_shrdi_epi32(_mm_undefined_si128(), (__mmask8)(-1), (A), (B), (I))
#define _mm256_mask_shrdi_epi16(S, U, A, B, I) __extension__ ({ \
(__m256i)__builtin_ia32_vpshrdw256_mask((__v16hi)(A), \
(__v16hi)(B), \
(int)(I), \
(__v16hi)(S), \
(__mmask16)(U)); })
#define _mm256_maskz_shrdi_epi16(U, A, B, I) \
_mm256_mask_shrdi_epi16(_mm256_setzero_hi(), (U), (A), (B), (I))
#define _mm256_shrdi_epi16(A, B, I) \
_mm256_mask_shrdi_epi16(_mm256_undefined_si256(), (__mmask8)(-1), (A), (B), (I))
#define _mm128_mask_shrdi_epi16(S, U, A, B, I) __extension__ ({ \
(__m128i)__builtin_ia32_vpshrdw128_mask((__v8hi)(A), \
(__v8hi)(B), \
(int)(I), \
(__v8hi)(S), \
(__mmask8)(U)); })
#define _mm128_maskz_shrdi_epi16(U, A, B, I) \
_mm128_mask_shrdi_epi16(_mm128_setzero_hi(), (U), (A), (B), (I))
#define _mm128_shrdi_epi16(A, B, I) \
_mm128_mask_shrdi_epi16(_mm_undefined_si128(), (__mmask8)(-1), (A), (B), (I))
static __inline__ __m256i __DEFAULT_FN_ATTRS
_mm256_mask_shldv_epi64(__m256i __S, __mmask8 __U, __m256i __A, __m256i __B)
{
return (__m256i) __builtin_ia32_vpshldvq256_mask ((__v4di) __S,
(__v4di) __A,
(__v4di) __B,
__U);
}
static __inline__ __m256i __DEFAULT_FN_ATTRS
_mm256_maskz_shldv_epi64(__mmask8 __U, __m256i __S, __m256i __A, __m256i __B)
{
return (__m256i) __builtin_ia32_vpshldvq256_maskz ((__v4di) __S,
(__v4di) __A,
(__v4di) __B,
__U);
}
static __inline__ __m256i __DEFAULT_FN_ATTRS
_mm256_shldv_epi64(__m256i __S, __m256i __A, __m256i __B)
{
return (__m256i) __builtin_ia32_vpshldvq256_mask ((__v4di) __S,
(__v4di) __A,
(__v4di) __B,
(__mmask8) -1);
}
static __inline__ __m128i __DEFAULT_FN_ATTRS
_mm128_mask_shldv_epi64(__m128i __S, __mmask8 __U, __m128i __A, __m128i __B)
{
return (__m128i) __builtin_ia32_vpshldvq128_mask ((__v2di) __S,
(__v2di) __A,
(__v2di) __B,
__U);
}
static __inline__ __m128i __DEFAULT_FN_ATTRS
_mm128_maskz_shldv_epi64(__mmask8 __U, __m128i __S, __m128i __A, __m128i __B)
{
return (__m128i) __builtin_ia32_vpshldvq128_maskz ((__v2di) __S,
(__v2di) __A,
(__v2di) __B,
__U);
}
static __inline__ __m128i __DEFAULT_FN_ATTRS
_mm128_shldv_epi64(__m128i __S, __m128i __A, __m128i __B)
{
return (__m128i) __builtin_ia32_vpshldvq128_mask ((__v2di) __S,
(__v2di) __A,
(__v2di) __B,
(__mmask8) -1);
}
static __inline__ __m256i __DEFAULT_FN_ATTRS
_mm256_mask_shldv_epi32(__m256i __S, __mmask8 __U, __m256i __A, __m256i __B)
{
return (__m256i) __builtin_ia32_vpshldvd256_mask ((__v8si) __S,
(__v8si) __A,
(__v8si) __B,
__U);
}
static __inline__ __m256i __DEFAULT_FN_ATTRS
_mm256_maskz_shldv_epi32(__mmask8 __U, __m256i __S, __m256i __A, __m256i __B)
{
return (__m256i) __builtin_ia32_vpshldvd256_maskz ((__v8si) __S,
(__v8si) __A,
(__v8si) __B,
__U);
}
static __inline__ __m256i __DEFAULT_FN_ATTRS
_mm256_shldv_epi32(__m256i __S, __m256i __A, __m256i __B)
{
return (__m256i) __builtin_ia32_vpshldvd256_mask ((__v8si) __S,
(__v8si) __A,
(__v8si) __B,
(__mmask8) -1);
}
static __inline__ __m128i __DEFAULT_FN_ATTRS
_mm128_mask_shldv_epi32(__m128i __S, __mmask8 __U, __m128i __A, __m128i __B)
{
return (__m128i) __builtin_ia32_vpshldvd128_mask ((__v4si) __S,
(__v4si) __A,
(__v4si) __B,
__U);
}
static __inline__ __m128i __DEFAULT_FN_ATTRS
_mm128_maskz_shldv_epi32(__mmask8 __U, __m128i __S, __m128i __A, __m128i __B)
{
return (__m128i) __builtin_ia32_vpshldvd128_maskz ((__v4si) __S,
(__v4si) __A,
(__v4si) __B,
__U);
}
static __inline__ __m128i __DEFAULT_FN_ATTRS
_mm128_shldv_epi32(__m128i __S, __m128i __A, __m128i __B)
{
return (__m128i) __builtin_ia32_vpshldvd128_mask ((__v4si) __S,
(__v4si) __A,
(__v4si) __B,
(__mmask8) -1);
}
static __inline__ __m256i __DEFAULT_FN_ATTRS
_mm256_mask_shldv_epi16(__m256i __S, __mmask16 __U, __m256i __A, __m256i __B)
{
return (__m256i) __builtin_ia32_vpshldvw256_mask ((__v16hi) __S,
(__v16hi) __A,
(__v16hi) __B,
__U);
}
static __inline__ __m256i __DEFAULT_FN_ATTRS
_mm256_maskz_shldv_epi16(__mmask16 __U, __m256i __S, __m256i __A, __m256i __B)
{
return (__m256i) __builtin_ia32_vpshldvw256_maskz ((__v16hi) __S,
(__v16hi) __A,
(__v16hi) __B,
__U);
}
static __inline__ __m256i __DEFAULT_FN_ATTRS
_mm256_shldv_epi16(__m256i __S, __m256i __A, __m256i __B)
{
return (__m256i) __builtin_ia32_vpshldvw256_mask ((__v16hi) __S,
(__v16hi) __A,
(__v16hi) __B,
(__mmask16) -1);
}
static __inline__ __m128i __DEFAULT_FN_ATTRS
_mm128_mask_shldv_epi16(__m128i __S, __mmask8 __U, __m128i __A, __m128i __B)
{
return (__m128i) __builtin_ia32_vpshldvw128_mask ((__v8hi) __S,
(__v8hi) __A,
(__v8hi) __B,
__U);
}
static __inline__ __m128i __DEFAULT_FN_ATTRS
_mm128_maskz_shldv_epi16(__mmask8 __U, __m128i __S, __m128i __A, __m128i __B)
{
return (__m128i) __builtin_ia32_vpshldvw128_maskz ((__v8hi) __S,
(__v8hi) __A,
(__v8hi) __B,
__U);
}
static __inline__ __m128i __DEFAULT_FN_ATTRS
_mm128_shldv_epi16(__m128i __S, __m128i __A, __m128i __B)
{
return (__m128i) __builtin_ia32_vpshldvw128_mask ((__v8hi) __S,
(__v8hi) __A,
(__v8hi) __B,
(__mmask8) -1);
}
static __inline__ __m256i __DEFAULT_FN_ATTRS
_mm256_mask_shrdv_epi64(__m256i __S, __mmask8 __U, __m256i __A, __m256i __B)
{
return (__m256i) __builtin_ia32_vpshrdvq256_mask ((__v4di) __S,
(__v4di) __A,
(__v4di) __B,
__U);
}
static __inline__ __m256i __DEFAULT_FN_ATTRS
_mm256_maskz_shrdv_epi64(__mmask8 __U, __m256i __S, __m256i __A, __m256i __B)
{
return (__m256i) __builtin_ia32_vpshrdvq256_maskz ((__v4di) __S,
(__v4di) __A,
(__v4di) __B,
__U);
}
static __inline__ __m256i __DEFAULT_FN_ATTRS
_mm256_shrdv_epi64(__m256i __S, __m256i __A, __m256i __B)
{
return (__m256i) __builtin_ia32_vpshrdvq256_mask ((__v4di) __S,
(__v4di) __A,
(__v4di) __B,
(__mmask8) -1);
}
static __inline__ __m128i __DEFAULT_FN_ATTRS
_mm128_mask_shrdv_epi64(__m128i __S, __mmask8 __U, __m128i __A, __m128i __B)
{
return (__m128i) __builtin_ia32_vpshrdvq128_mask ((__v2di) __S,
(__v2di) __A,
(__v2di) __B,
__U);
}
static __inline__ __m128i __DEFAULT_FN_ATTRS
_mm128_maskz_shrdv_epi64(__mmask8 __U, __m128i __S, __m128i __A, __m128i __B)
{
return (__m128i) __builtin_ia32_vpshrdvq128_maskz ((__v2di) __S,
(__v2di) __A,
(__v2di) __B,
__U);
}
static __inline__ __m128i __DEFAULT_FN_ATTRS
_mm128_shrdv_epi64(__m128i __S, __m128i __A, __m128i __B)
{
return (__m128i) __builtin_ia32_vpshrdvq128_mask ((__v2di) __S,
(__v2di) __A,
(__v2di) __B,
(__mmask8) -1);
}
static __inline__ __m256i __DEFAULT_FN_ATTRS
_mm256_mask_shrdv_epi32(__m256i __S, __mmask8 __U, __m256i __A, __m256i __B)
{
return (__m256i) __builtin_ia32_vpshrdvd256_mask ((__v8si) __S,
(__v8si) __A,
(__v8si) __B,
__U);
}
static __inline__ __m256i __DEFAULT_FN_ATTRS
_mm256_maskz_shrdv_epi32(__mmask8 __U, __m256i __S, __m256i __A, __m256i __B)
{
return (__m256i) __builtin_ia32_vpshrdvd256_maskz ((__v8si) __S,
(__v8si) __A,
(__v8si) __B,
__U);
}
static __inline__ __m256i __DEFAULT_FN_ATTRS
_mm256_shrdv_epi32(__m256i __S, __m256i __A, __m256i __B)
{
return (__m256i) __builtin_ia32_vpshrdvd256_mask ((__v8si) __S,
(__v8si) __A,
(__v8si) __B,
(__mmask8) -1);
}
static __inline__ __m128i __DEFAULT_FN_ATTRS
_mm128_mask_shrdv_epi32(__m128i __S, __mmask8 __U, __m128i __A, __m128i __B)
{
return (__m128i) __builtin_ia32_vpshrdvd128_mask ((__v4si) __S,
(__v4si) __A,
(__v4si) __B,
__U);
}
static __inline__ __m128i __DEFAULT_FN_ATTRS
_mm128_maskz_shrdv_epi32(__mmask8 __U, __m128i __S, __m128i __A, __m128i __B)
{
return (__m128i) __builtin_ia32_vpshrdvd128_maskz ((__v4si) __S,
(__v4si) __A,
(__v4si) __B,
__U);
}
static __inline__ __m128i __DEFAULT_FN_ATTRS
_mm128_shrdv_epi32(__m128i __S, __m128i __A, __m128i __B)
{
return (__m128i) __builtin_ia32_vpshrdvd128_mask ((__v4si) __S,
(__v4si) __A,
(__v4si) __B,
(__mmask8) -1);
}
static __inline__ __m256i __DEFAULT_FN_ATTRS
_mm256_mask_shrdv_epi16(__m256i __S, __mmask16 __U, __m256i __A, __m256i __B)
{
return (__m256i) __builtin_ia32_vpshrdvw256_mask ((__v16hi) __S,
(__v16hi) __A,
(__v16hi) __B,
__U);
}
static __inline__ __m256i __DEFAULT_FN_ATTRS
_mm256_maskz_shrdv_epi16(__mmask16 __U, __m256i __S, __m256i __A, __m256i __B)
{
return (__m256i) __builtin_ia32_vpshrdvw256_maskz ((__v16hi) __S,
(__v16hi) __A,
(__v16hi) __B,
__U);
}
static __inline__ __m256i __DEFAULT_FN_ATTRS
_mm256_shrdv_epi16(__m256i __S, __m256i __A, __m256i __B)
{
return (__m256i) __builtin_ia32_vpshrdvw256_mask ((__v16hi) __S,
(__v16hi) __A,
(__v16hi) __B,
(__mmask16) -1);
}
static __inline__ __m128i __DEFAULT_FN_ATTRS
_mm128_mask_shrdv_epi16(__m128i __S, __mmask8 __U, __m128i __A, __m128i __B)
{
return (__m128i) __builtin_ia32_vpshrdvw128_mask ((__v8hi) __S,
(__v8hi) __A,
(__v8hi) __B,
__U);
}
static __inline__ __m128i __DEFAULT_FN_ATTRS
_mm128_maskz_shrdv_epi16(__mmask8 __U, __m128i __S, __m128i __A, __m128i __B)
{
return (__m128i) __builtin_ia32_vpshrdvw128_maskz ((__v8hi) __S,
(__v8hi) __A,
(__v8hi) __B,
__U);
}
static __inline__ __m128i __DEFAULT_FN_ATTRS
_mm128_shrdv_epi16(__m128i __S, __m128i __A, __m128i __B)
{
return (__m128i) __builtin_ia32_vpshrdvw128_mask ((__v8hi) __S,
(__v8hi) __A,
(__v8hi) __B,
(__mmask8) -1);
}
#undef __DEFAULT_FN_ATTRS
#endif

View File

@@ -0,0 +1,254 @@
/*===------------- avx512vlvnniintrin.h - VNNI intrinsics ------------------===
*
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
*
*===-----------------------------------------------------------------------===
*/
#ifndef __IMMINTRIN_H
#error "Never use <avx512vlvnniintrin.h> directly; include <immintrin.h> instead."
#endif
#ifndef __AVX512VLVNNIINTRIN_H
#define __AVX512VLVNNIINTRIN_H
/* Define the default attributes for the functions in this file. */
#define __DEFAULT_FN_ATTRS __attribute__((__always_inline__, __nodebug__, __target__("avx512vl,avx512vnni")))
static __inline__ __m256i __DEFAULT_FN_ATTRS
_mm256_mask_dpbusd_epi32(__m256i __S, __mmask8 __U, __m256i __A, __m256i __B)
{
return (__m256i) __builtin_ia32_vpdpbusd256_mask ((__v8si) __S,
(__v8si) __A,
(__v8si) __B,
(__mmask8) __U);
}
static __inline__ __m256i __DEFAULT_FN_ATTRS
_mm256_maskz_dpbusd_epi32(__mmask8 __U, __m256i __S, __m256i __A, __m256i __B)
{
return (__m256i) __builtin_ia32_vpdpbusd256_maskz ((__v8si) __S,
(__v8si) __A,
(__v8si) __B,
(__mmask8) __U);
}
static __inline__ __m256i __DEFAULT_FN_ATTRS
_mm256_dpbusd_epi32(__m256i __S, __m256i __A, __m256i __B)
{
return (__m256i) __builtin_ia32_vpdpbusd256_mask ((__v8si) __S,
(__v8si) __A,
(__v8si) __B,
(__mmask8) -1);
}
static __inline__ __m256i __DEFAULT_FN_ATTRS
_mm256_mask_dpbusds_epi32(__m256i __S, __mmask8 __U, __m256i __A, __m256i __B)
{
return (__m256i) __builtin_ia32_vpdpbusds256_mask ((__v8si) __S,
(__v8si) __A,
(__v8si) __B,
(__mmask8) __U);
}
static __inline__ __m256i __DEFAULT_FN_ATTRS
_mm256_maskz_dpbusds_epi32(__mmask8 __U, __m256i __S, __m256i __A, __m256i __B)
{
return (__m256i) __builtin_ia32_vpdpbusds256_maskz ((__v8si) __S,
(__v8si) __A,
(__v8si) __B,
(__mmask8) __U);
}
static __inline__ __m256i __DEFAULT_FN_ATTRS
_mm256_dpbusds_epi32(__m256i __S, __m256i __A, __m256i __B)
{
return (__m256i) __builtin_ia32_vpdpbusds256_mask ((__v8si) __S,
(__v8si) __A,
(__v8si) __B,
(__mmask8) -1);
}
static __inline__ __m256i __DEFAULT_FN_ATTRS
_mm256_mask_dpwssd_epi32(__m256i __S, __mmask8 __U, __m256i __A, __m256i __B)
{
return (__m256i) __builtin_ia32_vpdpwssd256_mask ((__v8si) __S,
(__v8si) __A,
(__v8si) __B,
(__mmask8) __U);
}
static __inline__ __m256i __DEFAULT_FN_ATTRS
_mm256_maskz_dpwssd_epi32(__mmask8 __U, __m256i __S, __m256i __A, __m256i __B)
{
return (__m256i) __builtin_ia32_vpdpwssd256_maskz ((__v8si) __S,
(__v8si) __A,
(__v8si) __B,
(__mmask8) __U);
}
static __inline__ __m256i __DEFAULT_FN_ATTRS
_mm256_dpwssd_epi32(__m256i __S, __m256i __A, __m256i __B)
{
return (__m256i) __builtin_ia32_vpdpwssd256_mask ((__v8si) __S,
(__v8si) __A,
(__v8si) __B,
(__mmask8) -1);
}
static __inline__ __m256i __DEFAULT_FN_ATTRS
_mm256_mask_dpwssds_epi32(__m256i __S, __mmask8 __U, __m256i __A, __m256i __B)
{
return (__m256i) __builtin_ia32_vpdpwssds256_mask ((__v8si) __S,
(__v8si) __A,
(__v8si) __B,
(__mmask8) __U);
}
static __inline__ __m256i __DEFAULT_FN_ATTRS
_mm256_maskz_dpwssds_epi32(__mmask8 __U, __m256i __S, __m256i __A, __m256i __B)
{
return (__m256i) __builtin_ia32_vpdpwssds256_maskz ((__v8si) __S,
(__v8si) __A,
(__v8si) __B,
(__mmask8) __U);
}
static __inline__ __m256i __DEFAULT_FN_ATTRS
_mm256_dpwssds_epi32(__m256i __S, __m256i __A, __m256i __B)
{
return (__m256i) __builtin_ia32_vpdpwssds256_mask ((__v8si) __S,
(__v8si) __A,
(__v8si) __B,
(__mmask8) -1);
}
static __inline__ __m128i __DEFAULT_FN_ATTRS
_mm128_mask_dpbusd_epi32(__m128i __S, __mmask8 __U, __m128i __A, __m128i __B)
{
return (__m128i) __builtin_ia32_vpdpbusd128_mask ((__v4si) __S,
(__v4si) __A,
(__v4si) __B,
(__mmask8) __U);
}
static __inline__ __m128i __DEFAULT_FN_ATTRS
_mm128_maskz_dpbusd_epi32(__mmask8 __U, __m128i __S, __m128i __A, __m128i __B)
{
return (__m128i) __builtin_ia32_vpdpbusd128_maskz ((__v4si) __S,
(__v4si) __A,
(__v4si) __B,
(__mmask8) __U);
}
static __inline__ __m128i __DEFAULT_FN_ATTRS
_mm128_dpbusd_epi32(__m128i __S, __m128i __A, __m128i __B)
{
return (__m128i) __builtin_ia32_vpdpbusd128_mask ((__v4si) __S,
(__v4si) __A,
(__v4si) __B,
(__mmask8) -1);
}
static __inline__ __m128i __DEFAULT_FN_ATTRS
_mm128_mask_dpbusds_epi32(__m128i __S, __mmask8 __U, __m128i __A, __m128i __B)
{
return (__m128i) __builtin_ia32_vpdpbusds128_mask ((__v4si) __S,
(__v4si) __A,
(__v4si) __B,
(__mmask8) __U);
}
static __inline__ __m128i __DEFAULT_FN_ATTRS
_mm128_maskz_dpbusds_epi32(__mmask8 __U, __m128i __S, __m128i __A, __m128i __B)
{
return (__m128i) __builtin_ia32_vpdpbusds128_maskz ((__v4si) __S,
(__v4si) __A,
(__v4si) __B,
(__mmask8) __U);
}
static __inline__ __m128i __DEFAULT_FN_ATTRS
_mm128_dpbusds_epi32(__m128i __S, __m128i __A, __m128i __B)
{
return (__m128i) __builtin_ia32_vpdpbusds128_mask ((__v4si) __S,
(__v4si) __A,
(__v4si) __B,
(__mmask8) -1);
}
static __inline__ __m128i __DEFAULT_FN_ATTRS
_mm128_mask_dpwssd_epi32(__m128i __S, __mmask8 __U, __m128i __A, __m128i __B)
{
return (__m128i) __builtin_ia32_vpdpwssd128_mask ((__v4si) __S,
(__v4si) __A,
(__v4si) __B,
(__mmask8) __U);
}
static __inline__ __m128i __DEFAULT_FN_ATTRS
_mm128_maskz_dpwssd_epi32(__mmask8 __U, __m128i __S, __m128i __A, __m128i __B)
{
return (__m128i) __builtin_ia32_vpdpwssd128_maskz ((__v4si) __S,
(__v4si) __A,
(__v4si) __B,
(__mmask8) __U);
}
static __inline__ __m128i __DEFAULT_FN_ATTRS
_mm128_dpwssd_epi32(__m128i __S, __m128i __A, __m128i __B)
{
return (__m128i) __builtin_ia32_vpdpwssd128_mask ((__v4si) __S,
(__v4si) __A,
(__v4si) __B,
(__mmask8) -1);
}
static __inline__ __m128i __DEFAULT_FN_ATTRS
_mm128_mask_dpwssds_epi32(__m128i __S, __mmask8 __U, __m128i __A, __m128i __B)
{
return (__m128i) __builtin_ia32_vpdpwssds128_mask ((__v4si) __S,
(__v4si) __A,
(__v4si) __B,
(__mmask8) __U);
}
static __inline__ __m128i __DEFAULT_FN_ATTRS
_mm128_maskz_dpwssds_epi32(__mmask8 __U, __m128i __S, __m128i __A, __m128i __B)
{
return (__m128i) __builtin_ia32_vpdpwssds128_maskz ((__v4si) __S,
(__v4si) __A,
(__v4si) __B,
(__mmask8) __U);
}
static __inline__ __m128i __DEFAULT_FN_ATTRS
_mm128_dpwssds_epi32(__m128i __S, __m128i __A, __m128i __B)
{
return (__m128i) __builtin_ia32_vpdpwssds128_mask ((__v4si) __S,
(__v4si) __A,
(__v4si) __B,
(__mmask8) -1);
}
#undef __DEFAULT_FN_ATTRS
#endif

View File

@@ -0,0 +1,146 @@
/*===------------- avx512vnniintrin.h - VNNI intrinsics ------------------===
*
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
*
*===-----------------------------------------------------------------------===
*/
#ifndef __IMMINTRIN_H
#error "Never use <avx512vnniintrin.h> directly; include <immintrin.h> instead."
#endif
#ifndef __AVX512VNNIINTRIN_H
#define __AVX512VNNIINTRIN_H
/* Define the default attributes for the functions in this file. */
#define __DEFAULT_FN_ATTRS __attribute__((__always_inline__, __nodebug__, __target__("avx512vnni")))
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_mask_dpbusd_epi32(__m512i __S, __mmask16 __U, __m512i __A, __m512i __B)
{
return (__m512i) __builtin_ia32_vpdpbusd512_mask ((__v16si) __S,
(__v16si) __A,
(__v16si) __B,
(__mmask16) __U);
}
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_maskz_dpbusd_epi32(__mmask16 __U, __m512i __S, __m512i __A, __m512i __B)
{
return (__m512i) __builtin_ia32_vpdpbusd512_maskz ((__v16si) __S,
(__v16si) __A,
(__v16si) __B,
(__mmask16) __U);
}
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_dpbusd_epi32(__m512i __S, __m512i __A, __m512i __B)
{
return (__m512i) __builtin_ia32_vpdpbusd512_mask ((__v16si) __S,
(__v16si) __A,
(__v16si) __B,
(__mmask16) -1);
}
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_mask_dpbusds_epi32(__m512i __S, __mmask16 __U, __m512i __A, __m512i __B)
{
return (__m512i) __builtin_ia32_vpdpbusds512_mask ((__v16si) __S,
(__v16si) __A,
(__v16si) __B,
(__mmask16) __U);
}
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_maskz_dpbusds_epi32(__mmask16 __U, __m512i __S, __m512i __A, __m512i __B)
{
return (__m512i) __builtin_ia32_vpdpbusds512_maskz ((__v16si) __S,
(__v16si) __A,
(__v16si) __B,
(__mmask16) __U);
}
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_dpbusds_epi32(__m512i __S, __m512i __A, __m512i __B)
{
return (__m512i) __builtin_ia32_vpdpbusds512_mask ((__v16si) __S,
(__v16si) __A,
(__v16si) __B,
(__mmask16) -1);
}
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_mask_dpwssd_epi32(__m512i __S, __mmask16 __U, __m512i __A, __m512i __B)
{
return (__m512i) __builtin_ia32_vpdpwssd512_mask ((__v16si) __S,
(__v16si) __A,
(__v16si) __B,
(__mmask16) __U);
}
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_maskz_dpwssd_epi32(__mmask16 __U, __m512i __S, __m512i __A, __m512i __B)
{
return (__m512i) __builtin_ia32_vpdpwssd512_maskz ((__v16si) __S,
(__v16si) __A,
(__v16si) __B,
(__mmask16) __U);
}
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_dpwssd_epi32(__m512i __S, __m512i __A, __m512i __B)
{
return (__m512i) __builtin_ia32_vpdpwssd512_mask ((__v16si) __S,
(__v16si) __A,
(__v16si) __B,
(__mmask16) -1);
}
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_mask_dpwssds_epi32(__m512i __S, __mmask16 __U, __m512i __A, __m512i __B)
{
return (__m512i) __builtin_ia32_vpdpwssds512_mask ((__v16si) __S,
(__v16si) __A,
(__v16si) __B,
(__mmask16) __U);
}
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_maskz_dpwssds_epi32(__mmask16 __U, __m512i __S, __m512i __A, __m512i __B)
{
return (__m512i) __builtin_ia32_vpdpwssds512_maskz ((__v16si) __S,
(__v16si) __A,
(__v16si) __B,
(__mmask16) __U);
}
static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_dpwssds_epi32(__m512i __S, __m512i __A, __m512i __B)
{
return (__m512i) __builtin_ia32_vpdpwssds512_mask ((__v16si) __S,
(__v16si) __A,
(__v16si) __B,
(__mmask16) -1);
}
#undef __DEFAULT_FN_ATTRS
#endif

View File

@@ -0,0 +1,99 @@
/*===------------- avx512vpopcntdqintrin.h - AVX512VPOPCNTDQ intrinsics
*------------------===
*
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
*
*===-----------------------------------------------------------------------===
*/
#ifndef __IMMINTRIN_H
#error \
"Never use <avx512vpopcntdqvlintrin.h> directly; include <immintrin.h> instead."
#endif
#ifndef __AVX512VPOPCNTDQVLINTRIN_H
#define __AVX512VPOPCNTDQVLINTRIN_H
/* Define the default attributes for the functions in this file. */
#define __DEFAULT_FN_ATTRS \
__attribute__((__always_inline__, __nodebug__, __target__("avx512vpopcntdq,avx512vl")))
static __inline__ __m128i __DEFAULT_FN_ATTRS _mm_popcnt_epi64(__m128i __A) {
return (__m128i)__builtin_ia32_vpopcntq_128((__v2di)__A);
}
static __inline__ __m128i __DEFAULT_FN_ATTRS
_mm_mask_popcnt_epi64(__m128i __W, __mmask8 __U, __m128i __A) {
return (__m128i)__builtin_ia32_selectq_128(
(__mmask8)__U, (__v2di)_mm_popcnt_epi64(__A), (__v2di)__W);
}
static __inline__ __m128i __DEFAULT_FN_ATTRS
_mm_maskz_popcnt_epi64(__mmask8 __U, __m128i __A) {
return _mm_mask_popcnt_epi64((__m128i)_mm_setzero_si128(), __U, __A);
}
static __inline__ __m128i __DEFAULT_FN_ATTRS _mm_popcnt_epi32(__m128i __A) {
return (__m128i)__builtin_ia32_vpopcntd_128((__v4si)__A);
}
static __inline__ __m128i __DEFAULT_FN_ATTRS
_mm_mask_popcnt_epi32(__m128i __W, __mmask8 __U, __m128i __A) {
return (__m128i)__builtin_ia32_selectd_128(
(__mmask8)__U, (__v4si)_mm_popcnt_epi32(__A), (__v4si)__W);
}
static __inline__ __m128i __DEFAULT_FN_ATTRS
_mm_maskz_popcnt_epi32(__mmask8 __U, __m128i __A) {
return _mm_mask_popcnt_epi32((__m128i)_mm_setzero_si128(), __U, __A);
}
static __inline__ __m256i __DEFAULT_FN_ATTRS _mm256_popcnt_epi64(__m256i __A) {
return (__m256i)__builtin_ia32_vpopcntq_256((__v4di)__A);
}
static __inline__ __m256i __DEFAULT_FN_ATTRS
_mm256_mask_popcnt_epi64(__m256i __W, __mmask8 __U, __m256i __A) {
return (__m256i)__builtin_ia32_selectq_256(
(__mmask8)__U, (__v4di)_mm256_popcnt_epi64(__A), (__v4di)__W);
}
static __inline__ __m256i __DEFAULT_FN_ATTRS
_mm256_maskz_popcnt_epi64(__mmask8 __U, __m256i __A) {
return _mm256_mask_popcnt_epi64((__m256i)_mm256_setzero_si256(), __U, __A);
}
static __inline__ __m256i __DEFAULT_FN_ATTRS _mm256_popcnt_epi32(__m256i __A) {
return (__m256i)__builtin_ia32_vpopcntd_256((__v8si)__A);
}
static __inline__ __m256i __DEFAULT_FN_ATTRS
_mm256_mask_popcnt_epi32(__m256i __W, __mmask8 __U, __m256i __A) {
return (__m256i)__builtin_ia32_selectd_256(
(__mmask8)__U, (__v8si)_mm256_popcnt_epi32(__A), (__v8si)__W);
}
static __inline__ __m256i __DEFAULT_FN_ATTRS
_mm256_maskz_popcnt_epi32(__mmask8 __U, __m256i __A) {
return _mm256_mask_popcnt_epi32((__m256i)_mm256_setzero_si256(), __U, __A);
}
#undef __DEFAULT_FN_ATTRS
#endif

93
c_headers/cetintrin.h Normal file
View File

@@ -0,0 +1,93 @@
/*===---- cetintrin.h - CET intrinsic ------------------------------------===
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
*
*===-----------------------------------------------------------------------===
*/
#ifndef __IMMINTRIN_H
#error "Never use <cetintrin.h> directly; include <immintrin.h> instead."
#endif
#ifndef __CETINTRIN_H
#define __CETINTRIN_H
/* Define the default attributes for the functions in this file. */
#define __DEFAULT_FN_ATTRS \
__attribute__((__always_inline__, __nodebug__, __target__("shstk")))
static __inline__ void __DEFAULT_FN_ATTRS _incsspd(int __a) {
__builtin_ia32_incsspd(__a);
}
#ifdef __x86_64__
static __inline__ void __DEFAULT_FN_ATTRS _incsspq(unsigned long long __a) {
__builtin_ia32_incsspq(__a);
}
#endif /* __x86_64__ */
static __inline__ unsigned int __DEFAULT_FN_ATTRS _rdsspd(unsigned int __a) {
return __builtin_ia32_rdsspd(__a);
}
#ifdef __x86_64__
static __inline__ unsigned long long __DEFAULT_FN_ATTRS _rdsspq(unsigned long long __a) {
return __builtin_ia32_rdsspq(__a);
}
#endif /* __x86_64__ */
static __inline__ void __DEFAULT_FN_ATTRS _saveprevssp() {
__builtin_ia32_saveprevssp();
}
static __inline__ void __DEFAULT_FN_ATTRS _rstorssp(void * __p) {
__builtin_ia32_rstorssp(__p);
}
static __inline__ void __DEFAULT_FN_ATTRS _wrssd(unsigned int __a, void * __p) {
__builtin_ia32_wrssd(__a, __p);
}
#ifdef __x86_64__
static __inline__ void __DEFAULT_FN_ATTRS _wrssq(unsigned long long __a, void * __p) {
__builtin_ia32_wrssq(__a, __p);
}
#endif /* __x86_64__ */
static __inline__ void __DEFAULT_FN_ATTRS _wrussd(unsigned int __a, void * __p) {
__builtin_ia32_wrussd(__a, __p);
}
#ifdef __x86_64__
static __inline__ void __DEFAULT_FN_ATTRS _wrussq(unsigned long long __a, void * __p) {
__builtin_ia32_wrussq(__a, __p);
}
#endif /* __x86_64__ */
static __inline__ void __DEFAULT_FN_ATTRS _setssbsy() {
__builtin_ia32_setssbsy();
}
static __inline__ void __DEFAULT_FN_ATTRS _clrssbsy(void * __p) {
__builtin_ia32_clrssbsy(__p);
}
#undef __DEFAULT_FN_ATTRS
#endif /* __CETINTRIN_H */

View File

@@ -32,7 +32,7 @@
#define __DEFAULT_FN_ATTRS __attribute__((__always_inline__, __nodebug__, __target__("clflushopt")))
static __inline__ void __DEFAULT_FN_ATTRS
_mm_clflushopt(char * __m) {
_mm_clflushopt(void const * __m) {
__builtin_ia32_clflushopt(__m);
}

52
c_headers/clwbintrin.h Normal file
View File

@@ -0,0 +1,52 @@
/*===---- clwbintrin.h - CLWB intrinsic ------------------------------------===
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
*
*===-----------------------------------------------------------------------===
*/
#ifndef __IMMINTRIN_H
#error "Never use <clwbintrin.h> directly; include <immintrin.h> instead."
#endif
#ifndef __CLWBINTRIN_H
#define __CLWBINTRIN_H
/* Define the default attributes for the functions in this file. */
#define __DEFAULT_FN_ATTRS __attribute__((__always_inline__, __nodebug__, __target__("clwb")))
/// \brief Writes back to memory the cache line (if modified) that contains the
/// linear address specified in \a __p from any level of the cache hierarchy in
/// the cache coherence domain
///
/// \headerfile <immintrin.h>
///
/// This intrinsic corresponds to the <c> CLWB </c> instruction.
///
/// \param __p
/// A pointer to the memory location used to identify the cache line to be
/// written back.
static __inline__ void __DEFAULT_FN_ATTRS
_mm_clwb(void const *__p) {
__builtin_ia32_clwb(__p);
}
#undef __DEFAULT_FN_ATTRS
#endif

View File

@@ -173,16 +173,24 @@
#define bit_AVX512VL 0x80000000
/* Features in %ecx for leaf 7 sub-leaf 0 */
#define bit_PREFTCHWT1 0x00000001
#define bit_AVX512VBMI 0x00000002
#define bit_PKU 0x00000004
#define bit_OSPKE 0x00000010
#define bit_PREFTCHWT1 0x00000001
#define bit_AVX512VBMI 0x00000002
#define bit_PKU 0x00000004
#define bit_OSPKE 0x00000010
#define bit_AVX512VBMI2 0x00000040
#define bit_SHSTK 0x00000080
#define bit_GFNI 0x00000100
#define bit_VAES 0x00000200
#define bit_VPCLMULQDQ 0x00000400
#define bit_AVX512VNNI 0x00000800
#define bit_AVX512BITALG 0x00001000
#define bit_AVX512VPOPCNTDQ 0x00004000
#define bit_RDPID 0x00400000
#define bit_RDPID 0x00400000
/* Features in %edx for leaf 7 sub-leaf 0 */
#define bit_AVX5124VNNIW 0x00000004
#define bit_AVX5124FMAPS 0x00000008
#define bit_IBT 0x00100000
/* Features in %eax for leaf 13 sub-leaf 1 */
#define bit_XSAVEOPT 0x00000001
@@ -192,6 +200,7 @@
/* Features in %ecx for leaf 0x80000001 */
#define bit_LAHF_LM 0x00000001
#define bit_ABM 0x00000020
#define bit_LZCNT bit_ABM /* for gcc compat */
#define bit_SSE4a 0x00000040
#define bit_PRFCHW 0x00000100
#define bit_XOP 0x00000800

View File

@@ -80,7 +80,7 @@ min(const __T &__a, const __T &__b, __Cmp __cmp) {
template <class __T>
inline __device__ const __T &
min(const __T &__a, const __T &__b) {
return __a < __b ? __b : __a;
return __a < __b ? __a : __b;
}
#ifdef _LIBCPP_END_NAMESPACE_STD

View File

@@ -26,7 +26,6 @@
#include_next <new>
// Device overrides for placement new and delete.
#pragma push_macro("CUDA_NOEXCEPT")
#if __cplusplus >= 201103L
#define CUDA_NOEXCEPT noexcept
@@ -34,6 +33,55 @@
#define CUDA_NOEXCEPT
#endif
// Device overrides for non-placement new and delete.
__device__ inline void *operator new(__SIZE_TYPE__ size) {
if (size == 0) {
size = 1;
}
return ::malloc(size);
}
__device__ inline void *operator new(__SIZE_TYPE__ size,
const std::nothrow_t &) CUDA_NOEXCEPT {
return ::operator new(size);
}
__device__ inline void *operator new[](__SIZE_TYPE__ size) {
return ::operator new(size);
}
__device__ inline void *operator new[](__SIZE_TYPE__ size,
const std::nothrow_t &) {
return ::operator new(size);
}
__device__ inline void operator delete(void* ptr) CUDA_NOEXCEPT {
if (ptr) {
::free(ptr);
}
}
__device__ inline void operator delete(void *ptr,
const std::nothrow_t &) CUDA_NOEXCEPT {
::operator delete(ptr);
}
__device__ inline void operator delete[](void* ptr) CUDA_NOEXCEPT {
::operator delete(ptr);
}
__device__ inline void operator delete[](void *ptr,
const std::nothrow_t &) CUDA_NOEXCEPT {
::operator delete(ptr);
}
// Sized delete, C++14 only.
#if __cplusplus >= 201402L
__device__ void operator delete(void *ptr, __SIZE_TYPE__ size) CUDA_NOEXCEPT {
::operator delete(ptr);
}
__device__ void operator delete[](void *ptr, __SIZE_TYPE__ size) CUDA_NOEXCEPT {
::operator delete(ptr);
}
#endif
// Device overrides for placement new and delete.
__device__ inline void *operator new(__SIZE_TYPE__, void *__ptr) CUDA_NOEXCEPT {
return __ptr;
}
@@ -42,6 +90,7 @@ __device__ inline void *operator new[](__SIZE_TYPE__, void *__ptr) CUDA_NOEXCEPT
}
__device__ inline void operator delete(void *, void *) CUDA_NOEXCEPT {}
__device__ inline void operator delete[](void *, void *) CUDA_NOEXCEPT {}
#pragma pop_macro("CUDA_NOEXCEPT")
#endif // include guard

View File

@@ -217,8 +217,8 @@ _mm_div_pd(__m128d __a, __m128d __b)
/// \brief Calculates the square root of the lower double-precision value of
/// the second operand and returns it in the lower 64 bits of the result.
/// The upper 64 bits of the result are copied from the upper double-
/// precision value of the first operand.
/// The upper 64 bits of the result are copied from the upper
/// double-precision value of the first operand.
///
/// \headerfile <x86intrin.h>
///
@@ -260,8 +260,8 @@ _mm_sqrt_pd(__m128d __a)
/// \brief Compares lower 64-bit double-precision values of both operands, and
/// returns the lesser of the pair of values in the lower 64-bits of the
/// result. The upper 64 bits of the result are copied from the upper double-
/// precision value of the first operand.
/// result. The upper 64 bits of the result are copied from the upper
/// double-precision value of the first operand.
///
/// \headerfile <x86intrin.h>
///
@@ -304,8 +304,8 @@ _mm_min_pd(__m128d __a, __m128d __b)
/// \brief Compares lower 64-bit double-precision values of both operands, and
/// returns the greater of the pair of values in the lower 64-bits of the
/// result. The upper 64 bits of the result are copied from the upper double-
/// precision value of the first operand.
/// result. The upper 64 bits of the result are copied from the upper
/// double-precision value of the first operand.
///
/// \headerfile <x86intrin.h>
///
@@ -983,8 +983,10 @@ _mm_cmpnge_sd(__m128d __a, __m128d __b)
}
/// \brief Compares the lower double-precision floating-point values in each of
/// the two 128-bit floating-point vectors of [2 x double] for equality. The
/// comparison yields 0 for false, 1 for true.
/// the two 128-bit floating-point vectors of [2 x double] for equality.
///
/// The comparison yields 0 for false, 1 for true. If either of the two
/// lower double-precision values is NaN, 0 is returned.
///
/// \headerfile <x86intrin.h>
///
@@ -996,7 +998,8 @@ _mm_cmpnge_sd(__m128d __a, __m128d __b)
/// \param __b
/// A 128-bit vector of [2 x double]. The lower double-precision value is
/// compared to the lower double-precision value of \a __a.
/// \returns An integer containing the comparison results.
/// \returns An integer containing the comparison results. If either of the two
/// lower double-precision values is NaN, 0 is returned.
static __inline__ int __DEFAULT_FN_ATTRS
_mm_comieq_sd(__m128d __a, __m128d __b)
{
@@ -1008,7 +1011,8 @@ _mm_comieq_sd(__m128d __a, __m128d __b)
/// the value in the first parameter is less than the corresponding value in
/// the second parameter.
///
/// The comparison yields 0 for false, 1 for true.
/// The comparison yields 0 for false, 1 for true. If either of the two
/// lower double-precision values is NaN, 0 is returned.
///
/// \headerfile <x86intrin.h>
///
@@ -1020,7 +1024,8 @@ _mm_comieq_sd(__m128d __a, __m128d __b)
/// \param __b
/// A 128-bit vector of [2 x double]. The lower double-precision value is
/// compared to the lower double-precision value of \a __a.
/// \returns An integer containing the comparison results.
/// \returns An integer containing the comparison results. If either of the two
/// lower double-precision values is NaN, 0 is returned.
static __inline__ int __DEFAULT_FN_ATTRS
_mm_comilt_sd(__m128d __a, __m128d __b)
{
@@ -1032,7 +1037,8 @@ _mm_comilt_sd(__m128d __a, __m128d __b)
/// the value in the first parameter is less than or equal to the
/// corresponding value in the second parameter.
///
/// The comparison yields 0 for false, 1 for true.
/// The comparison yields 0 for false, 1 for true. If either of the two
/// lower double-precision values is NaN, 0 is returned.
///
/// \headerfile <x86intrin.h>
///
@@ -1044,7 +1050,8 @@ _mm_comilt_sd(__m128d __a, __m128d __b)
/// \param __b
/// A 128-bit vector of [2 x double]. The lower double-precision value is
/// compared to the lower double-precision value of \a __a.
/// \returns An integer containing the comparison results.
/// \returns An integer containing the comparison results. If either of the two
/// lower double-precision values is NaN, 0 is returned.
static __inline__ int __DEFAULT_FN_ATTRS
_mm_comile_sd(__m128d __a, __m128d __b)
{
@@ -1056,7 +1063,8 @@ _mm_comile_sd(__m128d __a, __m128d __b)
/// the value in the first parameter is greater than the corresponding value
/// in the second parameter.
///
/// The comparison yields 0 for false, 1 for true.
/// The comparison yields 0 for false, 1 for true. If either of the two
/// lower double-precision values is NaN, 0 is returned.
///
/// \headerfile <x86intrin.h>
///
@@ -1068,7 +1076,8 @@ _mm_comile_sd(__m128d __a, __m128d __b)
/// \param __b
/// A 128-bit vector of [2 x double]. The lower double-precision value is
/// compared to the lower double-precision value of \a __a.
/// \returns An integer containing the comparison results.
/// \returns An integer containing the comparison results. If either of the two
/// lower double-precision values is NaN, 0 is returned.
static __inline__ int __DEFAULT_FN_ATTRS
_mm_comigt_sd(__m128d __a, __m128d __b)
{
@@ -1080,7 +1089,8 @@ _mm_comigt_sd(__m128d __a, __m128d __b)
/// the value in the first parameter is greater than or equal to the
/// corresponding value in the second parameter.
///
/// The comparison yields 0 for false, 1 for true.
/// The comparison yields 0 for false, 1 for true. If either of the two
/// lower double-precision values is NaN, 0 is returned.
///
/// \headerfile <x86intrin.h>
///
@@ -1092,7 +1102,8 @@ _mm_comigt_sd(__m128d __a, __m128d __b)
/// \param __b
/// A 128-bit vector of [2 x double]. The lower double-precision value is
/// compared to the lower double-precision value of \a __a.
/// \returns An integer containing the comparison results.
/// \returns An integer containing the comparison results. If either of the two
/// lower double-precision values is NaN, 0 is returned.
static __inline__ int __DEFAULT_FN_ATTRS
_mm_comige_sd(__m128d __a, __m128d __b)
{
@@ -1104,7 +1115,8 @@ _mm_comige_sd(__m128d __a, __m128d __b)
/// the value in the first parameter is unequal to the corresponding value in
/// the second parameter.
///
/// The comparison yields 0 for false, 1 for true.
/// The comparison yields 0 for false, 1 for true. If either of the two
/// lower double-precision values is NaN, 1 is returned.
///
/// \headerfile <x86intrin.h>
///
@@ -1116,7 +1128,8 @@ _mm_comige_sd(__m128d __a, __m128d __b)
/// \param __b
/// A 128-bit vector of [2 x double]. The lower double-precision value is
/// compared to the lower double-precision value of \a __a.
/// \returns An integer containing the comparison results.
/// \returns An integer containing the comparison results. If either of the two
/// lower double-precision values is NaN, 1 is returned.
static __inline__ int __DEFAULT_FN_ATTRS
_mm_comineq_sd(__m128d __a, __m128d __b)
{
@@ -1127,7 +1140,7 @@ _mm_comineq_sd(__m128d __a, __m128d __b)
/// the two 128-bit floating-point vectors of [2 x double] for equality. The
/// comparison yields 0 for false, 1 for true.
///
/// If either of the two lower double-precision values is NaN, 1 is returned.
/// If either of the two lower double-precision values is NaN, 0 is returned.
///
/// \headerfile <x86intrin.h>
///
@@ -1140,7 +1153,7 @@ _mm_comineq_sd(__m128d __a, __m128d __b)
/// A 128-bit vector of [2 x double]. The lower double-precision value is
/// compared to the lower double-precision value of \a __a.
/// \returns An integer containing the comparison results. If either of the two
/// lower double-precision values is NaN, 1 is returned.
/// lower double-precision values is NaN, 0 is returned.
static __inline__ int __DEFAULT_FN_ATTRS
_mm_ucomieq_sd(__m128d __a, __m128d __b)
{
@@ -1153,7 +1166,7 @@ _mm_ucomieq_sd(__m128d __a, __m128d __b)
/// the second parameter.
///
/// The comparison yields 0 for false, 1 for true. If either of the two lower
/// double-precision values is NaN, 1 is returned.
/// double-precision values is NaN, 0 is returned.
///
/// \headerfile <x86intrin.h>
///
@@ -1166,7 +1179,7 @@ _mm_ucomieq_sd(__m128d __a, __m128d __b)
/// A 128-bit vector of [2 x double]. The lower double-precision value is
/// compared to the lower double-precision value of \a __a.
/// \returns An integer containing the comparison results. If either of the two
/// lower double-precision values is NaN, 1 is returned.
/// lower double-precision values is NaN, 0 is returned.
static __inline__ int __DEFAULT_FN_ATTRS
_mm_ucomilt_sd(__m128d __a, __m128d __b)
{
@@ -1179,7 +1192,7 @@ _mm_ucomilt_sd(__m128d __a, __m128d __b)
/// corresponding value in the second parameter.
///
/// The comparison yields 0 for false, 1 for true. If either of the two lower
/// double-precision values is NaN, 1 is returned.
/// double-precision values is NaN, 0 is returned.
///
/// \headerfile <x86intrin.h>
///
@@ -1192,7 +1205,7 @@ _mm_ucomilt_sd(__m128d __a, __m128d __b)
/// A 128-bit vector of [2 x double]. The lower double-precision value is
/// compared to the lower double-precision value of \a __a.
/// \returns An integer containing the comparison results. If either of the two
/// lower double-precision values is NaN, 1 is returned.
/// lower double-precision values is NaN, 0 is returned.
static __inline__ int __DEFAULT_FN_ATTRS
_mm_ucomile_sd(__m128d __a, __m128d __b)
{
@@ -1257,7 +1270,7 @@ _mm_ucomige_sd(__m128d __a, __m128d __b)
/// the second parameter.
///
/// The comparison yields 0 for false, 1 for true. If either of the two lower
/// double-precision values is NaN, 0 is returned.
/// double-precision values is NaN, 1 is returned.
///
/// \headerfile <x86intrin.h>
///
@@ -1270,7 +1283,7 @@ _mm_ucomige_sd(__m128d __a, __m128d __b)
/// A 128-bit vector of [2 x double]. The lower double-precision value is
/// compared to the lower double-precision value of \a __a.
/// \returns An integer containing the comparison result. If either of the two
/// lower double-precision values is NaN, 0 is returned.
/// lower double-precision values is NaN, 1 is returned.
static __inline__ int __DEFAULT_FN_ATTRS
_mm_ucomineq_sd(__m128d __a, __m128d __b)
{
@@ -1935,14 +1948,15 @@ _mm_store_pd(double *__dp, __m128d __a)
///
/// \headerfile <x86intrin.h>
///
/// This intrinsic corresponds to the <c>VMOVDDUP + VMOVAPD / MOVLHPS + MOVAPS </c> instruction.
/// This intrinsic corresponds to the
/// <c> VMOVDDUP + VMOVAPD / MOVLHPS + MOVAPS </c> instruction.
///
/// \param __dp
/// A pointer to a memory location that can store two double-precision
/// values.
/// \param __a
/// A 128-bit vector of [2 x double] whose lower 64 bits are copied to each
/// of the values in \a dp.
/// of the values in \a __dp.
static __inline__ void __DEFAULT_FN_ATTRS
_mm_store1_pd(double *__dp, __m128d __a)
{
@@ -1950,18 +1964,20 @@ _mm_store1_pd(double *__dp, __m128d __a)
_mm_store_pd(__dp, __a);
}
/// \brief Stores a 128-bit vector of [2 x double] into an aligned memory
/// location.
/// \brief Moves the lower 64 bits of a 128-bit vector of [2 x double] twice to
/// the upper and lower 64 bits of a memory location.
///
/// \headerfile <x86intrin.h>
///
/// This intrinsic corresponds to the <c> VMOVAPD / MOVAPD </c> instruction.
/// This intrinsic corresponds to the
/// <c> VMOVDDUP + VMOVAPD / MOVLHPS + MOVAPS </c> instruction.
///
/// \param __dp
/// A pointer to a 128-bit memory location. The address of the memory
/// location has to be 16-byte aligned.
/// A pointer to a memory location that can store two double-precision
/// values.
/// \param __a
/// A 128-bit vector of [2 x double] containing the values to be stored.
/// A 128-bit vector of [2 x double] whose lower 64 bits are copied to each
/// of the values in \a __dp.
static __inline__ void __DEFAULT_FN_ATTRS
_mm_store_pd1(double *__dp, __m128d __a)
{
@@ -2258,7 +2274,11 @@ _mm_adds_epu16(__m128i __a, __m128i __b)
static __inline__ __m128i __DEFAULT_FN_ATTRS
_mm_avg_epu8(__m128i __a, __m128i __b)
{
return (__m128i)__builtin_ia32_pavgb128((__v16qi)__a, (__v16qi)__b);
typedef unsigned short __v16hu __attribute__ ((__vector_size__ (32)));
return (__m128i)__builtin_convertvector(
((__builtin_convertvector((__v16qu)__a, __v16hu) +
__builtin_convertvector((__v16qu)__b, __v16hu)) + 1)
>> 1, __v16qu);
}
/// \brief Computes the rounded avarages of corresponding elements of two
@@ -2278,7 +2298,11 @@ _mm_avg_epu8(__m128i __a, __m128i __b)
static __inline__ __m128i __DEFAULT_FN_ATTRS
_mm_avg_epu16(__m128i __a, __m128i __b)
{
return (__m128i)__builtin_ia32_pavgw128((__v8hi)__a, (__v8hi)__b);
typedef unsigned int __v8su __attribute__ ((__vector_size__ (32)));
return (__m128i)__builtin_convertvector(
((__builtin_convertvector((__v8hu)__a, __v8su) +
__builtin_convertvector((__v8hu)__b, __v8su)) + 1)
>> 1, __v8hu);
}
/// \brief Multiplies the corresponding elements of two 128-bit signed [8 x i16]
@@ -3838,8 +3862,7 @@ _mm_set1_epi8(char __b)
///
/// \headerfile <x86intrin.h>
///
/// This intrinsic corresponds to the <c> VPUNPCKLQDQ / PUNPCKLQDQ </c>
/// instruction.
/// This intrinsic does not correspond to a specific instruction.
///
/// \param __q0
/// A 64-bit integral value used to initialize the lower 64 bits of the
@@ -4010,7 +4033,7 @@ _mm_storeu_si128(__m128i *__p, __m128i __b)
/// specified unaligned memory location. When a mask bit is 1, the
/// corresponding byte is written, otherwise it is not written.
///
/// To minimize caching, the date is flagged as non-temporal (unlikely to be
/// To minimize caching, the data is flagged as non-temporal (unlikely to be
/// used again soon). Exception and trap behavior for elements not selected
/// for storage to memory are implementation dependent.
///
@@ -4524,8 +4547,8 @@ _mm_unpackhi_epi32(__m128i __a, __m128i __b)
return (__m128i)__builtin_shufflevector((__v4si)__a, (__v4si)__b, 2, 4+2, 3, 4+3);
}
/// \brief Unpacks the high-order (odd-indexed) values from two 128-bit vectors
/// of [2 x i64] and interleaves them into a 128-bit vector of [2 x i64].
/// \brief Unpacks the high-order 64-bit elements from two 128-bit vectors of
/// [2 x i64] and interleaves them into a 128-bit vector of [2 x i64].
///
/// \headerfile <x86intrin.h>
///
@@ -4657,7 +4680,7 @@ _mm_unpacklo_epi64(__m128i __a, __m128i __b)
///
/// \headerfile <x86intrin.h>
///
/// This intrinsic has no corresponding instruction.
/// This intrinsic corresponds to the <c> MOVDQ2Q </c> instruction.
///
/// \param __a
/// A 128-bit integer vector operand. The lower 64 bits are moved to the
@@ -4674,7 +4697,7 @@ _mm_movepi64_pi64(__m128i __a)
///
/// \headerfile <x86intrin.h>
///
/// This intrinsic corresponds to the <c> VMOVQ / MOVQ / MOVD </c> instruction.
/// This intrinsic corresponds to the <c> MOVD+VMOVQ </c> instruction.
///
/// \param __a
/// A 64-bit value.
@@ -4704,8 +4727,8 @@ _mm_move_epi64(__m128i __a)
return __builtin_shufflevector((__v2di)__a, (__m128i){ 0 }, 0, 2);
}
/// \brief Unpacks the high-order (odd-indexed) values from two 128-bit vectors
/// of [2 x double] and interleaves them into a 128-bit vector of [2 x
/// \brief Unpacks the high-order 64-bit elements from two 128-bit vectors of
/// [2 x double] and interleaves them into a 128-bit vector of [2 x
/// double].
///
/// \headerfile <x86intrin.h>
@@ -4725,7 +4748,7 @@ _mm_unpackhi_pd(__m128d __a, __m128d __b)
return __builtin_shufflevector((__v2df)__a, (__v2df)__b, 1, 2+1);
}
/// \brief Unpacks the low-order (even-indexed) values from two 128-bit vectors
/// \brief Unpacks the low-order 64-bit elements from two 128-bit vectors
/// of [2 x double] and interleaves them into a 128-bit vector of [2 x
/// double].
///
@@ -4784,9 +4807,9 @@ _mm_movemask_pd(__m128d __a)
/// A 128-bit vector of [2 x double].
/// \param i
/// An 8-bit immediate value. The least significant two bits specify which
/// elements to copy from a and b: \n
/// Bit[0] = 0: lower element of a copied to lower element of result. \n
/// Bit[0] = 1: upper element of a copied to lower element of result. \n
/// elements to copy from \a a and \a b: \n
/// Bit[0] = 0: lower element of \a a copied to lower element of result. \n
/// Bit[0] = 1: upper element of \a a copied to lower element of result. \n
/// Bit[1] = 0: lower element of \a b copied to upper element of result. \n
/// Bit[1] = 1: upper element of \a b copied to upper element of result. \n
/// \returns A 128-bit vector of [2 x double] containing the shuffled values.

View File

@@ -143,4 +143,18 @@
# define LDBL_DECIMAL_DIG __LDBL_DECIMAL_DIG__
#endif
#ifdef __STDC_WANT_IEC_60559_TYPES_EXT__
# define FLT16_MANT_DIG __FLT16_MANT_DIG__
# define FLT16_DECIMAL_DIG __FLT16_DECIMAL_DIG__
# define FLT16_DIG __FLT16_DIG__
# define FLT16_MIN_EXP __FLT16_MIN_EXP__
# define FLT16_MIN_10_EXP __FLT16_MIN_10_EXP__
# define FLT16_MAX_EXP __FLT16_MAX_EXP__
# define FLT16_MAX_10_EXP __FLT16_MAX_10_EXP__
# define FLT16_MAX __FLT16_MAX__
# define FLT16_EPSILON __FLT16_EPSILON__
# define FLT16_MIN __FLT16_MIN__
# define FLT16_TRUE_MIN __FLT16_TRUE_MIN__
#endif /* __STDC_WANT_IEC_60559_TYPES_EXT__ */
#endif /* __FLOAT_H */

View File

@@ -60,73 +60,73 @@ _mm_macc_sd(__m128d __A, __m128d __B, __m128d __C)
static __inline__ __m128 __DEFAULT_FN_ATTRS
_mm_msub_ps(__m128 __A, __m128 __B, __m128 __C)
{
return (__m128)__builtin_ia32_vfmsubps((__v4sf)__A, (__v4sf)__B, (__v4sf)__C);
return (__m128)__builtin_ia32_vfmaddps((__v4sf)__A, (__v4sf)__B, -(__v4sf)__C);
}
static __inline__ __m128d __DEFAULT_FN_ATTRS
_mm_msub_pd(__m128d __A, __m128d __B, __m128d __C)
{
return (__m128d)__builtin_ia32_vfmsubpd((__v2df)__A, (__v2df)__B, (__v2df)__C);
return (__m128d)__builtin_ia32_vfmaddpd((__v2df)__A, (__v2df)__B, -(__v2df)__C);
}
static __inline__ __m128 __DEFAULT_FN_ATTRS
_mm_msub_ss(__m128 __A, __m128 __B, __m128 __C)
{
return (__m128)__builtin_ia32_vfmsubss((__v4sf)__A, (__v4sf)__B, (__v4sf)__C);
return (__m128)__builtin_ia32_vfmaddss((__v4sf)__A, (__v4sf)__B, -(__v4sf)__C);
}
static __inline__ __m128d __DEFAULT_FN_ATTRS
_mm_msub_sd(__m128d __A, __m128d __B, __m128d __C)
{
return (__m128d)__builtin_ia32_vfmsubsd((__v2df)__A, (__v2df)__B, (__v2df)__C);
return (__m128d)__builtin_ia32_vfmaddsd((__v2df)__A, (__v2df)__B, -(__v2df)__C);
}
static __inline__ __m128 __DEFAULT_FN_ATTRS
_mm_nmacc_ps(__m128 __A, __m128 __B, __m128 __C)
{
return (__m128)__builtin_ia32_vfnmaddps((__v4sf)__A, (__v4sf)__B, (__v4sf)__C);
return (__m128)__builtin_ia32_vfmaddps(-(__v4sf)__A, (__v4sf)__B, (__v4sf)__C);
}
static __inline__ __m128d __DEFAULT_FN_ATTRS
_mm_nmacc_pd(__m128d __A, __m128d __B, __m128d __C)
{
return (__m128d)__builtin_ia32_vfnmaddpd((__v2df)__A, (__v2df)__B, (__v2df)__C);
return (__m128d)__builtin_ia32_vfmaddpd(-(__v2df)__A, (__v2df)__B, (__v2df)__C);
}
static __inline__ __m128 __DEFAULT_FN_ATTRS
_mm_nmacc_ss(__m128 __A, __m128 __B, __m128 __C)
{
return (__m128)__builtin_ia32_vfnmaddss((__v4sf)__A, (__v4sf)__B, (__v4sf)__C);
return (__m128)__builtin_ia32_vfmaddss(-(__v4sf)__A, (__v4sf)__B, (__v4sf)__C);
}
static __inline__ __m128d __DEFAULT_FN_ATTRS
_mm_nmacc_sd(__m128d __A, __m128d __B, __m128d __C)
{
return (__m128d)__builtin_ia32_vfnmaddsd((__v2df)__A, (__v2df)__B, (__v2df)__C);
return (__m128d)__builtin_ia32_vfmaddsd(-(__v2df)__A, (__v2df)__B, (__v2df)__C);
}
static __inline__ __m128 __DEFAULT_FN_ATTRS
_mm_nmsub_ps(__m128 __A, __m128 __B, __m128 __C)
{
return (__m128)__builtin_ia32_vfnmsubps((__v4sf)__A, (__v4sf)__B, (__v4sf)__C);
return (__m128)__builtin_ia32_vfmaddps(-(__v4sf)__A, (__v4sf)__B, -(__v4sf)__C);
}
static __inline__ __m128d __DEFAULT_FN_ATTRS
_mm_nmsub_pd(__m128d __A, __m128d __B, __m128d __C)
{
return (__m128d)__builtin_ia32_vfnmsubpd((__v2df)__A, (__v2df)__B, (__v2df)__C);
return (__m128d)__builtin_ia32_vfmaddpd(-(__v2df)__A, (__v2df)__B, -(__v2df)__C);
}
static __inline__ __m128 __DEFAULT_FN_ATTRS
_mm_nmsub_ss(__m128 __A, __m128 __B, __m128 __C)
{
return (__m128)__builtin_ia32_vfnmsubss((__v4sf)__A, (__v4sf)__B, (__v4sf)__C);
return (__m128)__builtin_ia32_vfmaddss(-(__v4sf)__A, (__v4sf)__B, -(__v4sf)__C);
}
static __inline__ __m128d __DEFAULT_FN_ATTRS
_mm_nmsub_sd(__m128d __A, __m128d __B, __m128d __C)
{
return (__m128d)__builtin_ia32_vfnmsubsd((__v2df)__A, (__v2df)__B, (__v2df)__C);
return (__m128d)__builtin_ia32_vfmaddsd(-(__v2df)__A, (__v2df)__B, -(__v2df)__C);
}
static __inline__ __m128 __DEFAULT_FN_ATTRS
@@ -144,13 +144,13 @@ _mm_maddsub_pd(__m128d __A, __m128d __B, __m128d __C)
static __inline__ __m128 __DEFAULT_FN_ATTRS
_mm_msubadd_ps(__m128 __A, __m128 __B, __m128 __C)
{
return (__m128)__builtin_ia32_vfmsubaddps((__v4sf)__A, (__v4sf)__B, (__v4sf)__C);
return (__m128)__builtin_ia32_vfmaddsubps((__v4sf)__A, (__v4sf)__B, -(__v4sf)__C);
}
static __inline__ __m128d __DEFAULT_FN_ATTRS
_mm_msubadd_pd(__m128d __A, __m128d __B, __m128d __C)
{
return (__m128d)__builtin_ia32_vfmsubaddpd((__v2df)__A, (__v2df)__B, (__v2df)__C);
return (__m128d)__builtin_ia32_vfmaddsubpd((__v2df)__A, (__v2df)__B, -(__v2df)__C);
}
static __inline__ __m256 __DEFAULT_FN_ATTRS
@@ -168,37 +168,37 @@ _mm256_macc_pd(__m256d __A, __m256d __B, __m256d __C)
static __inline__ __m256 __DEFAULT_FN_ATTRS
_mm256_msub_ps(__m256 __A, __m256 __B, __m256 __C)
{
return (__m256)__builtin_ia32_vfmsubps256((__v8sf)__A, (__v8sf)__B, (__v8sf)__C);
return (__m256)__builtin_ia32_vfmaddps256((__v8sf)__A, (__v8sf)__B, -(__v8sf)__C);
}
static __inline__ __m256d __DEFAULT_FN_ATTRS
_mm256_msub_pd(__m256d __A, __m256d __B, __m256d __C)
{
return (__m256d)__builtin_ia32_vfmsubpd256((__v4df)__A, (__v4df)__B, (__v4df)__C);
return (__m256d)__builtin_ia32_vfmaddpd256((__v4df)__A, (__v4df)__B, -(__v4df)__C);
}
static __inline__ __m256 __DEFAULT_FN_ATTRS
_mm256_nmacc_ps(__m256 __A, __m256 __B, __m256 __C)
{
return (__m256)__builtin_ia32_vfnmaddps256((__v8sf)__A, (__v8sf)__B, (__v8sf)__C);
return (__m256)__builtin_ia32_vfmaddps256(-(__v8sf)__A, (__v8sf)__B, (__v8sf)__C);
}
static __inline__ __m256d __DEFAULT_FN_ATTRS
_mm256_nmacc_pd(__m256d __A, __m256d __B, __m256d __C)
{
return (__m256d)__builtin_ia32_vfnmaddpd256((__v4df)__A, (__v4df)__B, (__v4df)__C);
return (__m256d)__builtin_ia32_vfmaddpd256(-(__v4df)__A, (__v4df)__B, (__v4df)__C);
}
static __inline__ __m256 __DEFAULT_FN_ATTRS
_mm256_nmsub_ps(__m256 __A, __m256 __B, __m256 __C)
{
return (__m256)__builtin_ia32_vfnmsubps256((__v8sf)__A, (__v8sf)__B, (__v8sf)__C);
return (__m256)__builtin_ia32_vfmaddps256(-(__v8sf)__A, (__v8sf)__B, -(__v8sf)__C);
}
static __inline__ __m256d __DEFAULT_FN_ATTRS
_mm256_nmsub_pd(__m256d __A, __m256d __B, __m256d __C)
{
return (__m256d)__builtin_ia32_vfnmsubpd256((__v4df)__A, (__v4df)__B, (__v4df)__C);
return (__m256d)__builtin_ia32_vfmaddpd256(-(__v4df)__A, (__v4df)__B, -(__v4df)__C);
}
static __inline__ __m256 __DEFAULT_FN_ATTRS
@@ -216,13 +216,13 @@ _mm256_maddsub_pd(__m256d __A, __m256d __B, __m256d __C)
static __inline__ __m256 __DEFAULT_FN_ATTRS
_mm256_msubadd_ps(__m256 __A, __m256 __B, __m256 __C)
{
return (__m256)__builtin_ia32_vfmsubaddps256((__v8sf)__A, (__v8sf)__B, (__v8sf)__C);
return (__m256)__builtin_ia32_vfmaddsubps256((__v8sf)__A, (__v8sf)__B, -(__v8sf)__C);
}
static __inline__ __m256d __DEFAULT_FN_ATTRS
_mm256_msubadd_pd(__m256d __A, __m256d __B, __m256d __C)
{
return (__m256d)__builtin_ia32_vfmsubaddpd256((__v4df)__A, (__v4df)__B, (__v4df)__C);
return (__m256d)__builtin_ia32_vfmaddsubpd256((__v4df)__A, (__v4df)__B, -(__v4df)__C);
}
#undef __DEFAULT_FN_ATTRS

View File

@@ -46,85 +46,85 @@ _mm_fmadd_pd(__m128d __A, __m128d __B, __m128d __C)
static __inline__ __m128 __DEFAULT_FN_ATTRS
_mm_fmadd_ss(__m128 __A, __m128 __B, __m128 __C)
{
return (__m128)__builtin_ia32_vfmaddss((__v4sf)__A, (__v4sf)__B, (__v4sf)__C);
return (__m128)__builtin_ia32_vfmaddss3((__v4sf)__A, (__v4sf)__B, (__v4sf)__C);
}
static __inline__ __m128d __DEFAULT_FN_ATTRS
_mm_fmadd_sd(__m128d __A, __m128d __B, __m128d __C)
{
return (__m128d)__builtin_ia32_vfmaddsd((__v2df)__A, (__v2df)__B, (__v2df)__C);
return (__m128d)__builtin_ia32_vfmaddsd3((__v2df)__A, (__v2df)__B, (__v2df)__C);
}
static __inline__ __m128 __DEFAULT_FN_ATTRS
_mm_fmsub_ps(__m128 __A, __m128 __B, __m128 __C)
{
return (__m128)__builtin_ia32_vfmsubps((__v4sf)__A, (__v4sf)__B, (__v4sf)__C);
return (__m128)__builtin_ia32_vfmaddps((__v4sf)__A, (__v4sf)__B, -(__v4sf)__C);
}
static __inline__ __m128d __DEFAULT_FN_ATTRS
_mm_fmsub_pd(__m128d __A, __m128d __B, __m128d __C)
{
return (__m128d)__builtin_ia32_vfmsubpd((__v2df)__A, (__v2df)__B, (__v2df)__C);
return (__m128d)__builtin_ia32_vfmaddpd((__v2df)__A, (__v2df)__B, -(__v2df)__C);
}
static __inline__ __m128 __DEFAULT_FN_ATTRS
_mm_fmsub_ss(__m128 __A, __m128 __B, __m128 __C)
{
return (__m128)__builtin_ia32_vfmsubss((__v4sf)__A, (__v4sf)__B, (__v4sf)__C);
return (__m128)__builtin_ia32_vfmaddss3((__v4sf)__A, (__v4sf)__B, -(__v4sf)__C);
}
static __inline__ __m128d __DEFAULT_FN_ATTRS
_mm_fmsub_sd(__m128d __A, __m128d __B, __m128d __C)
{
return (__m128d)__builtin_ia32_vfmsubsd((__v2df)__A, (__v2df)__B, (__v2df)__C);
return (__m128d)__builtin_ia32_vfmaddsd3((__v2df)__A, (__v2df)__B, -(__v2df)__C);
}
static __inline__ __m128 __DEFAULT_FN_ATTRS
_mm_fnmadd_ps(__m128 __A, __m128 __B, __m128 __C)
{
return (__m128)__builtin_ia32_vfnmaddps((__v4sf)__A, (__v4sf)__B, (__v4sf)__C);
return (__m128)__builtin_ia32_vfmaddps(-(__v4sf)__A, (__v4sf)__B, (__v4sf)__C);
}
static __inline__ __m128d __DEFAULT_FN_ATTRS
_mm_fnmadd_pd(__m128d __A, __m128d __B, __m128d __C)
{
return (__m128d)__builtin_ia32_vfnmaddpd((__v2df)__A, (__v2df)__B, (__v2df)__C);
return (__m128d)__builtin_ia32_vfmaddpd(-(__v2df)__A, (__v2df)__B, (__v2df)__C);
}
static __inline__ __m128 __DEFAULT_FN_ATTRS
_mm_fnmadd_ss(__m128 __A, __m128 __B, __m128 __C)
{
return (__m128)__builtin_ia32_vfnmaddss((__v4sf)__A, (__v4sf)__B, (__v4sf)__C);
return (__m128)__builtin_ia32_vfmaddss3((__v4sf)__A, -(__v4sf)__B, (__v4sf)__C);
}
static __inline__ __m128d __DEFAULT_FN_ATTRS
_mm_fnmadd_sd(__m128d __A, __m128d __B, __m128d __C)
{
return (__m128d)__builtin_ia32_vfnmaddsd((__v2df)__A, (__v2df)__B, (__v2df)__C);
return (__m128d)__builtin_ia32_vfmaddsd3((__v2df)__A, -(__v2df)__B, (__v2df)__C);
}
static __inline__ __m128 __DEFAULT_FN_ATTRS
_mm_fnmsub_ps(__m128 __A, __m128 __B, __m128 __C)
{
return (__m128)__builtin_ia32_vfnmsubps((__v4sf)__A, (__v4sf)__B, (__v4sf)__C);
return (__m128)__builtin_ia32_vfmaddps(-(__v4sf)__A, (__v4sf)__B, -(__v4sf)__C);
}
static __inline__ __m128d __DEFAULT_FN_ATTRS
_mm_fnmsub_pd(__m128d __A, __m128d __B, __m128d __C)
{
return (__m128d)__builtin_ia32_vfnmsubpd((__v2df)__A, (__v2df)__B, (__v2df)__C);
return (__m128d)__builtin_ia32_vfmaddpd(-(__v2df)__A, (__v2df)__B, -(__v2df)__C);
}
static __inline__ __m128 __DEFAULT_FN_ATTRS
_mm_fnmsub_ss(__m128 __A, __m128 __B, __m128 __C)
{
return (__m128)__builtin_ia32_vfnmsubss((__v4sf)__A, (__v4sf)__B, (__v4sf)__C);
return (__m128)__builtin_ia32_vfmaddss3((__v4sf)__A, -(__v4sf)__B, -(__v4sf)__C);
}
static __inline__ __m128d __DEFAULT_FN_ATTRS
_mm_fnmsub_sd(__m128d __A, __m128d __B, __m128d __C)
{
return (__m128d)__builtin_ia32_vfnmsubsd((__v2df)__A, (__v2df)__B, (__v2df)__C);
return (__m128d)__builtin_ia32_vfmaddsd3((__v2df)__A, -(__v2df)__B, -(__v2df)__C);
}
static __inline__ __m128 __DEFAULT_FN_ATTRS
@@ -142,13 +142,13 @@ _mm_fmaddsub_pd(__m128d __A, __m128d __B, __m128d __C)
static __inline__ __m128 __DEFAULT_FN_ATTRS
_mm_fmsubadd_ps(__m128 __A, __m128 __B, __m128 __C)
{
return (__m128)__builtin_ia32_vfmsubaddps((__v4sf)__A, (__v4sf)__B, (__v4sf)__C);
return (__m128)__builtin_ia32_vfmaddsubps((__v4sf)__A, (__v4sf)__B, -(__v4sf)__C);
}
static __inline__ __m128d __DEFAULT_FN_ATTRS
_mm_fmsubadd_pd(__m128d __A, __m128d __B, __m128d __C)
{
return (__m128d)__builtin_ia32_vfmsubaddpd((__v2df)__A, (__v2df)__B, (__v2df)__C);
return (__m128d)__builtin_ia32_vfmaddsubpd((__v2df)__A, (__v2df)__B, -(__v2df)__C);
}
static __inline__ __m256 __DEFAULT_FN_ATTRS
@@ -166,37 +166,37 @@ _mm256_fmadd_pd(__m256d __A, __m256d __B, __m256d __C)
static __inline__ __m256 __DEFAULT_FN_ATTRS
_mm256_fmsub_ps(__m256 __A, __m256 __B, __m256 __C)
{
return (__m256)__builtin_ia32_vfmsubps256((__v8sf)__A, (__v8sf)__B, (__v8sf)__C);
return (__m256)__builtin_ia32_vfmaddps256((__v8sf)__A, (__v8sf)__B, -(__v8sf)__C);
}
static __inline__ __m256d __DEFAULT_FN_ATTRS
_mm256_fmsub_pd(__m256d __A, __m256d __B, __m256d __C)
{
return (__m256d)__builtin_ia32_vfmsubpd256((__v4df)__A, (__v4df)__B, (__v4df)__C);
return (__m256d)__builtin_ia32_vfmaddpd256((__v4df)__A, (__v4df)__B, -(__v4df)__C);
}
static __inline__ __m256 __DEFAULT_FN_ATTRS
_mm256_fnmadd_ps(__m256 __A, __m256 __B, __m256 __C)
{
return (__m256)__builtin_ia32_vfnmaddps256((__v8sf)__A, (__v8sf)__B, (__v8sf)__C);
return (__m256)__builtin_ia32_vfmaddps256(-(__v8sf)__A, (__v8sf)__B, (__v8sf)__C);
}
static __inline__ __m256d __DEFAULT_FN_ATTRS
_mm256_fnmadd_pd(__m256d __A, __m256d __B, __m256d __C)
{
return (__m256d)__builtin_ia32_vfnmaddpd256((__v4df)__A, (__v4df)__B, (__v4df)__C);
return (__m256d)__builtin_ia32_vfmaddpd256(-(__v4df)__A, (__v4df)__B, (__v4df)__C);
}
static __inline__ __m256 __DEFAULT_FN_ATTRS
_mm256_fnmsub_ps(__m256 __A, __m256 __B, __m256 __C)
{
return (__m256)__builtin_ia32_vfnmsubps256((__v8sf)__A, (__v8sf)__B, (__v8sf)__C);
return (__m256)__builtin_ia32_vfmaddps256(-(__v8sf)__A, (__v8sf)__B, -(__v8sf)__C);
}
static __inline__ __m256d __DEFAULT_FN_ATTRS
_mm256_fnmsub_pd(__m256d __A, __m256d __B, __m256d __C)
{
return (__m256d)__builtin_ia32_vfnmsubpd256((__v4df)__A, (__v4df)__B, (__v4df)__C);
return (__m256d)__builtin_ia32_vfmaddpd256(-(__v4df)__A, (__v4df)__B, -(__v4df)__C);
}
static __inline__ __m256 __DEFAULT_FN_ATTRS
@@ -214,13 +214,13 @@ _mm256_fmaddsub_pd(__m256d __A, __m256d __B, __m256d __C)
static __inline__ __m256 __DEFAULT_FN_ATTRS
_mm256_fmsubadd_ps(__m256 __A, __m256 __B, __m256 __C)
{
return (__m256)__builtin_ia32_vfmsubaddps256((__v8sf)__A, (__v8sf)__B, (__v8sf)__C);
return (__m256)__builtin_ia32_vfmaddsubps256((__v8sf)__A, (__v8sf)__B, -(__v8sf)__C);
}
static __inline__ __m256d __DEFAULT_FN_ATTRS
_mm256_fmsubadd_pd(__m256d __A, __m256d __B, __m256d __C)
{
return (__m256d)__builtin_ia32_vfmsubaddpd256((__v4df)__A, (__v4df)__B, (__v4df)__C);
return (__m256d)__builtin_ia32_vfmaddsubpd256((__v4df)__A, (__v4df)__B, -(__v4df)__C);
}
#undef __DEFAULT_FN_ATTRS

202
c_headers/gfniintrin.h Normal file
View File

@@ -0,0 +1,202 @@
/*===----------------- gfniintrin.h - GFNI intrinsics ----------------------===
*
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
*
*===-----------------------------------------------------------------------===
*/
#ifndef __IMMINTRIN_H
#error "Never use <gfniintrin.h> directly; include <immintrin.h> instead."
#endif
#ifndef __GFNIINTRIN_H
#define __GFNIINTRIN_H
#define _mm_gf2p8affineinv_epi64_epi8(A, B, I) __extension__ ({ \
(__m128i)__builtin_ia32_vgf2p8affineinvqb_v16qi((__v16qi)(__m128i)(A), \
(__v16qi)(__m128i)(B), \
(char)(I)); })
#define _mm_mask_gf2p8affineinv_epi64_epi8(S, U, A, B, I) __extension__ ({ \
(__m128i)__builtin_ia32_selectb_128((__mmask16)(U), \
(__v16qi)_mm_gf2p8affineinv_epi64_epi8(A, B, I), \
(__v16qi)(__m128i)(S)); })
#define _mm_maskz_gf2p8affineinv_epi64_epi8(U, A, B, I) __extension__ ({ \
(__m128i)_mm_mask_gf2p8affineinv_epi64_epi8((__m128i)_mm_setzero_si128(), \
U, A, B, I); })
#define _mm256_gf2p8affineinv_epi64_epi8(A, B, I) __extension__ ({ \
(__m256i)__builtin_ia32_vgf2p8affineinvqb_v32qi((__v32qi)(__m256i)(A), \
(__v32qi)(__m256i)(B), \
(char)(I)); })
#define _mm256_mask_gf2p8affineinv_epi64_epi8(S, U, A, B, I) __extension__ ({ \
(__m256i)__builtin_ia32_selectb_256((__mmask32)(U), \
(__v32qi)_mm256_gf2p8affineinv_epi64_epi8(A, B, I), \
(__v32qi)(__m256i)(S)); })
#define _mm256_maskz_gf2p8affineinv_epi64_epi8(U, A, B, I) __extension__ ({ \
(__m256i)_mm256_mask_gf2p8affineinv_epi64_epi8((__m256i)_mm256_setzero_si256(), \
U, A, B, I); })
#define _mm512_gf2p8affineinv_epi64_epi8(A, B, I) __extension__ ({ \
(__m512i)__builtin_ia32_vgf2p8affineinvqb_v64qi((__v64qi)(__m512i)(A), \
(__v64qi)(__m512i)(B), \
(char)(I)); })
#define _mm512_mask_gf2p8affineinv_epi64_epi8(S, U, A, B, I) __extension__ ({ \
(__m512i)__builtin_ia32_selectb_512((__mmask64)(U), \
(__v64qi)_mm512_gf2p8affineinv_epi64_epi8(A, B, I), \
(__v64qi)(__m512i)(S)); })
#define _mm512_maskz_gf2p8affineinv_epi64_epi8(U, A, B, I) __extension__ ({ \
(__m512i)_mm512_mask_gf2p8affineinv_epi64_epi8((__m512i)_mm512_setzero_qi(), \
U, A, B, I); })
#define _mm_gf2p8affine_epi64_epi8(A, B, I) __extension__ ({ \
(__m128i)__builtin_ia32_vgf2p8affineqb_v16qi((__v16qi)(__m128i)(A), \
(__v16qi)(__m128i)(B), \
(char)(I)); })
#define _mm_mask_gf2p8affine_epi64_epi8(S, U, A, B, I) __extension__ ({ \
(__m128i)__builtin_ia32_selectb_128((__mmask16)(U), \
(__v16qi)_mm_gf2p8affine_epi64_epi8(A, B, I), \
(__v16qi)(__m128i)(S)); })
#define _mm_maskz_gf2p8affine_epi64_epi8(U, A, B, I) __extension__ ({ \
(__m128i)_mm_mask_gf2p8affine_epi64_epi8((__m128i)_mm_setzero_si128(), \
U, A, B, I); })
#define _mm256_gf2p8affine_epi64_epi8(A, B, I) __extension__ ({ \
(__m256i)__builtin_ia32_vgf2p8affineqb_v32qi((__v32qi)(__m256i)(A), \
(__v32qi)(__m256i)(B), \
(char)(I)); })
#define _mm256_mask_gf2p8affine_epi64_epi8(S, U, A, B, I) __extension__ ({ \
(__m256i)__builtin_ia32_selectb_256((__mmask32)(U), \
(__v32qi)_mm256_gf2p8affine_epi64_epi8(A, B, I), \
(__v32qi)(__m256i)(S)); })
#define _mm256_maskz_gf2p8affine_epi64_epi8(U, A, B, I) __extension__ ({ \
(__m256i)_mm256_mask_gf2p8affine_epi64_epi8((__m256i)_mm256_setzero_si256(), \
U, A, B, I); })
#define _mm512_gf2p8affine_epi64_epi8(A, B, I) __extension__ ({ \
(__m512i)__builtin_ia32_vgf2p8affineqb_v64qi((__v64qi)(__m512i)(A), \
(__v64qi)(__m512i)(B), \
(char)(I)); })
#define _mm512_mask_gf2p8affine_epi64_epi8(S, U, A, B, I) __extension__ ({ \
(__m512i)__builtin_ia32_selectb_512((__mmask64)(U), \
(__v64qi)_mm512_gf2p8affine_epi64_epi8(A, B, I), \
(__v64qi)(__m512i)(S)); })
#define _mm512_maskz_gf2p8affine_epi64_epi8(U, A, B, I) __extension__ ({ \
(__m512i)_mm512_mask_gf2p8affine_epi64_epi8((__m512i)_mm512_setzero_qi(), \
U, A, B, I); })
/* Default attributes for simple form (no masking). */
#define __DEFAULT_FN_ATTRS __attribute__((__always_inline__, __nodebug__, __target__("gfni")))
/* Default attributes for ZMM forms. */
#define __DEFAULT_FN_ATTRS_F __attribute__((__always_inline__, __nodebug__, __target__("avx512bw,gfni")))
/* Default attributes for VLX forms. */
#define __DEFAULT_FN_ATTRS_VL __attribute__((__always_inline__, __nodebug__, __target__("avx512bw,avx512vl,gfni")))
static __inline__ __m128i __DEFAULT_FN_ATTRS
_mm_gf2p8mul_epi8(__m128i __A, __m128i __B)
{
return (__m128i) __builtin_ia32_vgf2p8mulb_v16qi((__v16qi) __A,
(__v16qi) __B);
}
static __inline__ __m128i __DEFAULT_FN_ATTRS_VL
_mm_mask_gf2p8mul_epi8(__m128i __S, __mmask16 __U, __m128i __A, __m128i __B)
{
return (__m128i) __builtin_ia32_selectb_128(__U,
(__v16qi) _mm_gf2p8mul_epi8(__A, __B),
(__v16qi) __S);
}
static __inline__ __m128i __DEFAULT_FN_ATTRS_VL
_mm_maskz_gf2p8mul_epi8(__mmask16 __U, __m128i __A, __m128i __B)
{
return _mm_mask_gf2p8mul_epi8((__m128i)_mm_setzero_si128(),
__U, __A, __B);
}
static __inline__ __m256i __DEFAULT_FN_ATTRS
_mm256_gf2p8mul_epi8(__m256i __A, __m256i __B)
{
return (__m256i) __builtin_ia32_vgf2p8mulb_v32qi((__v32qi) __A,
(__v32qi) __B);
}
static __inline__ __m256i __DEFAULT_FN_ATTRS_VL
_mm256_mask_gf2p8mul_epi8(__m256i __S, __mmask32 __U, __m256i __A, __m256i __B)
{
return (__m256i) __builtin_ia32_selectb_256(__U,
(__v32qi) _mm256_gf2p8mul_epi8(__A, __B),
(__v32qi) __S);
}
static __inline__ __m256i __DEFAULT_FN_ATTRS_VL
_mm256_maskz_gf2p8mul_epi8(__mmask32 __U, __m256i __A, __m256i __B)
{
return _mm256_mask_gf2p8mul_epi8((__m256i)_mm256_setzero_si256(),
__U, __A, __B);
}
static __inline__ __m512i __DEFAULT_FN_ATTRS_F
_mm512_gf2p8mul_epi8(__m512i __A, __m512i __B)
{
return (__m512i) __builtin_ia32_vgf2p8mulb_v64qi((__v64qi) __A,
(__v64qi) __B);
}
static __inline__ __m512i __DEFAULT_FN_ATTRS_F
_mm512_mask_gf2p8mul_epi8(__m512i __S, __mmask64 __U, __m512i __A, __m512i __B)
{
return (__m512i) __builtin_ia32_selectb_512(__U,
(__v64qi) _mm512_gf2p8mul_epi8(__A, __B),
(__v64qi) __S);
}
static __inline__ __m512i __DEFAULT_FN_ATTRS_F
_mm512_maskz_gf2p8mul_epi8(__mmask64 __U, __m512i __A, __m512i __B)
{
return _mm512_mask_gf2p8mul_epi8((__m512i)_mm512_setzero_qi(),
__U, __A, __B);
}
#undef __DEFAULT_FN_ATTRS
#undef __DEFAULT_FN_ATTRS_F
#undef __DEFAULT_FN_ATTRS_VL
#endif // __GFNIINTRIN_H

View File

@@ -58,6 +58,10 @@
#include <clflushoptintrin.h>
#endif
#if !defined(_MSC_VER) || __has_feature(modules) || defined(__CLWB__)
#include <clwbintrin.h>
#endif
#if !defined(_MSC_VER) || __has_feature(modules) || defined(__AVX__)
#include <avxintrin.h>
#endif
@@ -114,6 +118,10 @@ _mm256_cvtph_ps(__m128i __a)
}
#endif /* __AVX2__ */
#if !defined(_MSC_VER) || __has_feature(modules) || defined(__VPCLMULQDQ__)
#include <vpclmulqdqintrin.h>
#endif
#if !defined(_MSC_VER) || __has_feature(modules) || defined(__BMI__)
#include <bmiintrin.h>
#endif
@@ -142,6 +150,10 @@ _mm256_cvtph_ps(__m128i __a)
#include <avx512bwintrin.h>
#endif
#if !defined(_MSC_VER) || __has_feature(modules) || defined(__AVX512BITALG__)
#include <avx512bitalgintrin.h>
#endif
#if !defined(_MSC_VER) || __has_feature(modules) || defined(__AVX512CD__)
#include <avx512cdintrin.h>
#endif
@@ -150,10 +162,29 @@ _mm256_cvtph_ps(__m128i __a)
#include <avx512vpopcntdqintrin.h>
#endif
#if !defined(_MSC_VER) || __has_feature(modules) || \
(defined(__AVX512VL__) && defined(__AVX512VPOPCNTDQ__))
#include <avx512vpopcntdqvlintrin.h>
#endif
#if !defined(_MSC_VER) || __has_feature(modules) || defined(__AVX512VNNI__)
#include <avx512vnniintrin.h>
#endif
#if !defined(_MSC_VER) || __has_feature(modules) || \
(defined(__AVX512VL__) && defined(__AVX512VNNI__))
#include <avx512vlvnniintrin.h>
#endif
#if !defined(_MSC_VER) || __has_feature(modules) || defined(__AVX512DQ__)
#include <avx512dqintrin.h>
#endif
#if !defined(_MSC_VER) || __has_feature(modules) || \
(defined(__AVX512VL__) && defined(__AVX512BITALG__))
#include <avx512vlbitalgintrin.h>
#endif
#if !defined(_MSC_VER) || __has_feature(modules) || \
(defined(__AVX512VL__) && defined(__AVX512BW__))
#include <avx512vlbwintrin.h>
@@ -191,6 +222,15 @@ _mm256_cvtph_ps(__m128i __a)
#include <avx512vbmivlintrin.h>
#endif
#if !defined(_MSC_VER) || __has_feature(modules) || defined(__AVX512VBMI2__)
#include <avx512vbmi2intrin.h>
#endif
#if !defined(_MSC_VER) || __has_feature(modules) || \
(defined(__AVX512VBMI2__) && defined(__AVX512VL__))
#include <avx512vlvbmi2intrin.h>
#endif
#if !defined(_MSC_VER) || __has_feature(modules) || defined(__AVX512PF__)
#include <avx512pfintrin.h>
#endif
@@ -199,6 +239,14 @@ _mm256_cvtph_ps(__m128i __a)
#include <pkuintrin.h>
#endif
#if !defined(_MSC_VER) || __has_feature(modules) || defined(__VAES__)
#include <vaesintrin.h>
#endif
#if !defined(_MSC_VER) || __has_feature(modules) || defined(__GFNI__)
#include <gfniintrin.h>
#endif
#if !defined(_MSC_VER) || __has_feature(modules) || defined(__RDRND__)
static __inline__ int __attribute__((__always_inline__, __nodebug__, __target__("rdrnd")))
_rdrand16_step(unsigned short *__p)
@@ -315,6 +363,10 @@ _writegsbase_u64(unsigned long long __V)
#include <xsavesintrin.h>
#endif
#if !defined(_MSC_VER) || __has_feature(modules) || defined(__SHSTK__)
#include <cetintrin.h>
#endif
/* Some intrinsics inside adxintrin.h are available only on processors with ADX,
* whereas others are also available at all times. */
#include <adxintrin.h>

View File

@@ -38,6 +38,10 @@
#include <armintr.h>
#endif
#if defined(_M_ARM64)
#include <arm64intr.h>
#endif
/* For the definition of jmp_buf. */
#if __STDC_HOSTED__
#include <setjmp.h>
@@ -828,7 +832,7 @@ _InterlockedCompareExchange_nf(long volatile *_Destination,
__ATOMIC_SEQ_CST, __ATOMIC_RELAXED);
return _Comparand;
}
static __inline__ short __DEFAULT_FN_ATTRS
static __inline__ long __DEFAULT_FN_ATTRS
_InterlockedCompareExchange_rel(long volatile *_Destination,
long _Exchange, long _Comparand) {
__atomic_compare_exchange(_Destination, &_Comparand, &_Exchange, 0,

View File

@@ -11381,6 +11381,8 @@ half16 __ovld __cnfn bitselect(half16 a, half16 b, half16 c);
* For each component of a vector type,
* result[i] = if MSB of c[i] is set ? b[i] : a[i].
* For a scalar type, result = c ? b : a.
* b and a must have the same type.
* c must have the same number of elements and bits as a.
*/
char __ovld __cnfn select(char a, char b, char c);
uchar __ovld __cnfn select(uchar a, uchar b, char c);
@@ -11394,60 +11396,7 @@ char8 __ovld __cnfn select(char8 a, char8 b, char8 c);
uchar8 __ovld __cnfn select(uchar8 a, uchar8 b, char8 c);
char16 __ovld __cnfn select(char16 a, char16 b, char16 c);
uchar16 __ovld __cnfn select(uchar16 a, uchar16 b, char16 c);
short __ovld __cnfn select(short a, short b, char c);
ushort __ovld __cnfn select(ushort a, ushort b, char c);
short2 __ovld __cnfn select(short2 a, short2 b, char2 c);
ushort2 __ovld __cnfn select(ushort2 a, ushort2 b, char2 c);
short3 __ovld __cnfn select(short3 a, short3 b, char3 c);
ushort3 __ovld __cnfn select(ushort3 a, ushort3 b, char3 c);
short4 __ovld __cnfn select(short4 a, short4 b, char4 c);
ushort4 __ovld __cnfn select(ushort4 a, ushort4 b, char4 c);
short8 __ovld __cnfn select(short8 a, short8 b, char8 c);
ushort8 __ovld __cnfn select(ushort8 a, ushort8 b, char8 c);
short16 __ovld __cnfn select(short16 a, short16 b, char16 c);
ushort16 __ovld __cnfn select(ushort16 a, ushort16 b, char16 c);
int __ovld __cnfn select(int a, int b, char c);
uint __ovld __cnfn select(uint a, uint b, char c);
int2 __ovld __cnfn select(int2 a, int2 b, char2 c);
uint2 __ovld __cnfn select(uint2 a, uint2 b, char2 c);
int3 __ovld __cnfn select(int3 a, int3 b, char3 c);
uint3 __ovld __cnfn select(uint3 a, uint3 b, char3 c);
int4 __ovld __cnfn select(int4 a, int4 b, char4 c);
uint4 __ovld __cnfn select(uint4 a, uint4 b, char4 c);
int8 __ovld __cnfn select(int8 a, int8 b, char8 c);
uint8 __ovld __cnfn select(uint8 a, uint8 b, char8 c);
int16 __ovld __cnfn select(int16 a, int16 b, char16 c);
uint16 __ovld __cnfn select(uint16 a, uint16 b, char16 c);
long __ovld __cnfn select(long a, long b, char c);
ulong __ovld __cnfn select(ulong a, ulong b, char c);
long2 __ovld __cnfn select(long2 a, long2 b, char2 c);
ulong2 __ovld __cnfn select(ulong2 a, ulong2 b, char2 c);
long3 __ovld __cnfn select(long3 a, long3 b, char3 c);
ulong3 __ovld __cnfn select(ulong3 a, ulong3 b, char3 c);
long4 __ovld __cnfn select(long4 a, long4 b, char4 c);
ulong4 __ovld __cnfn select(ulong4 a, ulong4 b, char4 c);
long8 __ovld __cnfn select(long8 a, long8 b, char8 c);
ulong8 __ovld __cnfn select(ulong8 a, ulong8 b, char8 c);
long16 __ovld __cnfn select(long16 a, long16 b, char16 c);
ulong16 __ovld __cnfn select(ulong16 a, ulong16 b, char16 c);
float __ovld __cnfn select(float a, float b, char c);
float2 __ovld __cnfn select(float2 a, float2 b, char2 c);
float3 __ovld __cnfn select(float3 a, float3 b, char3 c);
float4 __ovld __cnfn select(float4 a, float4 b, char4 c);
float8 __ovld __cnfn select(float8 a, float8 b, char8 c);
float16 __ovld __cnfn select(float16 a, float16 b, char16 c);
char __ovld __cnfn select(char a, char b, short c);
uchar __ovld __cnfn select(uchar a, uchar b, short c);
char2 __ovld __cnfn select(char2 a, char2 b, short2 c);
uchar2 __ovld __cnfn select(uchar2 a, uchar2 b, short2 c);
char3 __ovld __cnfn select(char3 a, char3 b, short3 c);
uchar3 __ovld __cnfn select(uchar3 a, uchar3 b, short3 c);
char4 __ovld __cnfn select(char4 a, char4 b, short4 c);
uchar4 __ovld __cnfn select(uchar4 a, uchar4 b, short4 c);
char8 __ovld __cnfn select(char8 a, char8 b, short8 c);
uchar8 __ovld __cnfn select(uchar8 a, uchar8 b, short8 c);
char16 __ovld __cnfn select(char16 a, char16 b, short16 c);
uchar16 __ovld __cnfn select(uchar16 a, uchar16 b, short16 c);
short __ovld __cnfn select(short a, short b, short c);
ushort __ovld __cnfn select(ushort a, ushort b, short c);
short2 __ovld __cnfn select(short2 a, short2 b, short2 c);
@@ -11460,60 +11409,7 @@ short8 __ovld __cnfn select(short8 a, short8 b, short8 c);
ushort8 __ovld __cnfn select(ushort8 a, ushort8 b, short8 c);
short16 __ovld __cnfn select(short16 a, short16 b, short16 c);
ushort16 __ovld __cnfn select(ushort16 a, ushort16 b, short16 c);
int __ovld __cnfn select(int a, int b, short c);
uint __ovld __cnfn select(uint a, uint b, short c);
int2 __ovld __cnfn select(int2 a, int2 b, short2 c);
uint2 __ovld __cnfn select(uint2 a, uint2 b, short2 c);
int3 __ovld __cnfn select(int3 a, int3 b, short3 c);
uint3 __ovld __cnfn select(uint3 a, uint3 b, short3 c);
int4 __ovld __cnfn select(int4 a, int4 b, short4 c);
uint4 __ovld __cnfn select(uint4 a, uint4 b, short4 c);
int8 __ovld __cnfn select(int8 a, int8 b, short8 c);
uint8 __ovld __cnfn select(uint8 a, uint8 b, short8 c);
int16 __ovld __cnfn select(int16 a, int16 b, short16 c);
uint16 __ovld __cnfn select(uint16 a, uint16 b, short16 c);
long __ovld __cnfn select(long a, long b, short c);
ulong __ovld __cnfn select(ulong a, ulong b, short c);
long2 __ovld __cnfn select(long2 a, long2 b, short2 c);
ulong2 __ovld __cnfn select(ulong2 a, ulong2 b, short2 c);
long3 __ovld __cnfn select(long3 a, long3 b, short3 c);
ulong3 __ovld __cnfn select(ulong3 a, ulong3 b, short3 c);
long4 __ovld __cnfn select(long4 a, long4 b, short4 c);
ulong4 __ovld __cnfn select(ulong4 a, ulong4 b, short4 c);
long8 __ovld __cnfn select(long8 a, long8 b, short8 c);
ulong8 __ovld __cnfn select(ulong8 a, ulong8 b, short8 c);
long16 __ovld __cnfn select(long16 a, long16 b, short16 c);
ulong16 __ovld __cnfn select(ulong16 a, ulong16 b, short16 c);
float __ovld __cnfn select(float a, float b, short c);
float2 __ovld __cnfn select(float2 a, float2 b, short2 c);
float3 __ovld __cnfn select(float3 a, float3 b, short3 c);
float4 __ovld __cnfn select(float4 a, float4 b, short4 c);
float8 __ovld __cnfn select(float8 a, float8 b, short8 c);
float16 __ovld __cnfn select(float16 a, float16 b, short16 c);
char __ovld __cnfn select(char a, char b, int c);
uchar __ovld __cnfn select(uchar a, uchar b, int c);
char2 __ovld __cnfn select(char2 a, char2 b, int2 c);
uchar2 __ovld __cnfn select(uchar2 a, uchar2 b, int2 c);
char3 __ovld __cnfn select(char3 a, char3 b, int3 c);
uchar3 __ovld __cnfn select(uchar3 a, uchar3 b, int3 c);
char4 __ovld __cnfn select(char4 a, char4 b, int4 c);
uchar4 __ovld __cnfn select(uchar4 a, uchar4 b, int4 c);
char8 __ovld __cnfn select(char8 a, char8 b, int8 c);
uchar8 __ovld __cnfn select(uchar8 a, uchar8 b, int8 c);
char16 __ovld __cnfn select(char16 a, char16 b, int16 c);
uchar16 __ovld __cnfn select(uchar16 a, uchar16 b, int16 c);
short __ovld __cnfn select(short a, short b, int c);
ushort __ovld __cnfn select(ushort a, ushort b, int c);
short2 __ovld __cnfn select(short2 a, short2 b, int2 c);
ushort2 __ovld __cnfn select(ushort2 a, ushort2 b, int2 c);
short3 __ovld __cnfn select(short3 a, short3 b, int3 c);
ushort3 __ovld __cnfn select(ushort3 a, ushort3 b, int3 c);
short4 __ovld __cnfn select(short4 a, short4 b, int4 c);
ushort4 __ovld __cnfn select(ushort4 a, ushort4 b, int4 c);
short8 __ovld __cnfn select(short8 a, short8 b, int8 c);
ushort8 __ovld __cnfn select(ushort8 a, ushort8 b, int8 c);
short16 __ovld __cnfn select(short16 a, short16 b, int16 c);
ushort16 __ovld __cnfn select(ushort16 a, ushort16 b, int16 c);
int __ovld __cnfn select(int a, int b, int c);
uint __ovld __cnfn select(uint a, uint b, int c);
int2 __ovld __cnfn select(int2 a, int2 b, int2 c);
@@ -11526,60 +11422,13 @@ int8 __ovld __cnfn select(int8 a, int8 b, int8 c);
uint8 __ovld __cnfn select(uint8 a, uint8 b, int8 c);
int16 __ovld __cnfn select(int16 a, int16 b, int16 c);
uint16 __ovld __cnfn select(uint16 a, uint16 b, int16 c);
long __ovld __cnfn select(long a, long b, int c);
ulong __ovld __cnfn select(ulong a, ulong b, int c);
long2 __ovld __cnfn select(long2 a, long2 b, int2 c);
ulong2 __ovld __cnfn select(ulong2 a, ulong2 b, int2 c);
long3 __ovld __cnfn select(long3 a, long3 b, int3 c);
ulong3 __ovld __cnfn select(ulong3 a, ulong3 b, int3 c);
long4 __ovld __cnfn select(long4 a, long4 b, int4 c);
ulong4 __ovld __cnfn select(ulong4 a, ulong4 b, int4 c);
long8 __ovld __cnfn select(long8 a, long8 b, int8 c);
ulong8 __ovld __cnfn select(ulong8 a, ulong8 b, int8 c);
long16 __ovld __cnfn select(long16 a, long16 b, int16 c);
ulong16 __ovld __cnfn select(ulong16 a, ulong16 b, int16 c);
float __ovld __cnfn select(float a, float b, int c);
float2 __ovld __cnfn select(float2 a, float2 b, int2 c);
float3 __ovld __cnfn select(float3 a, float3 b, int3 c);
float4 __ovld __cnfn select(float4 a, float4 b, int4 c);
float8 __ovld __cnfn select(float8 a, float8 b, int8 c);
float16 __ovld __cnfn select(float16 a, float16 b, int16 c);
char __ovld __cnfn select(char a, char b, long c);
uchar __ovld __cnfn select(uchar a, uchar b, long c);
char2 __ovld __cnfn select(char2 a, char2 b, long2 c);
uchar2 __ovld __cnfn select(uchar2 a, uchar2 b, long2 c);
char3 __ovld __cnfn select(char3 a, char3 b, long3 c);
uchar3 __ovld __cnfn select(uchar3 a, uchar3 b, long3 c);
char4 __ovld __cnfn select(char4 a, char4 b, long4 c);
uchar4 __ovld __cnfn select(uchar4 a, uchar4 b, long4 c);
char8 __ovld __cnfn select(char8 a, char8 b, long8 c);
uchar8 __ovld __cnfn select(uchar8 a, uchar8 b, long8 c);
char16 __ovld __cnfn select(char16 a, char16 b, long16 c);
uchar16 __ovld __cnfn select(uchar16 a, uchar16 b, long16 c);
short __ovld __cnfn select(short a, short b, long c);
ushort __ovld __cnfn select(ushort a, ushort b, long c);
short2 __ovld __cnfn select(short2 a, short2 b, long2 c);
ushort2 __ovld __cnfn select(ushort2 a, ushort2 b, long2 c);
short3 __ovld __cnfn select(short3 a, short3 b, long3 c);
ushort3 __ovld __cnfn select(ushort3 a, ushort3 b, long3 c);
short4 __ovld __cnfn select(short4 a, short4 b, long4 c);
ushort4 __ovld __cnfn select(ushort4 a, ushort4 b, long4 c);
short8 __ovld __cnfn select(short8 a, short8 b, long8 c);
ushort8 __ovld __cnfn select(ushort8 a, ushort8 b, long8 c);
short16 __ovld __cnfn select(short16 a, short16 b, long16 c);
ushort16 __ovld __cnfn select(ushort16 a, ushort16 b, long16 c);
int __ovld __cnfn select(int a, int b, long c);
uint __ovld __cnfn select(uint a, uint b, long c);
int2 __ovld __cnfn select(int2 a, int2 b, long2 c);
uint2 __ovld __cnfn select(uint2 a, uint2 b, long2 c);
int3 __ovld __cnfn select(int3 a, int3 b, long3 c);
uint3 __ovld __cnfn select(uint3 a, uint3 b, long3 c);
int4 __ovld __cnfn select(int4 a, int4 b, long4 c);
uint4 __ovld __cnfn select(uint4 a, uint4 b, long4 c);
int8 __ovld __cnfn select(int8 a, int8 b, long8 c);
uint8 __ovld __cnfn select(uint8 a, uint8 b, long8 c);
int16 __ovld __cnfn select(int16 a, int16 b, long16 c);
uint16 __ovld __cnfn select(uint16 a, uint16 b, long16 c);
long __ovld __cnfn select(long a, long b, long c);
ulong __ovld __cnfn select(ulong a, ulong b, long c);
long2 __ovld __cnfn select(long2 a, long2 b, long2 c);
@@ -11592,12 +11441,7 @@ long8 __ovld __cnfn select(long8 a, long8 b, long8 c);
ulong8 __ovld __cnfn select(ulong8 a, ulong8 b, long8 c);
long16 __ovld __cnfn select(long16 a, long16 b, long16 c);
ulong16 __ovld __cnfn select(ulong16 a, ulong16 b, long16 c);
float __ovld __cnfn select(float a, float b, long c);
float2 __ovld __cnfn select(float2 a, float2 b, long2 c);
float3 __ovld __cnfn select(float3 a, float3 b, long3 c);
float4 __ovld __cnfn select(float4 a, float4 b, long4 c);
float8 __ovld __cnfn select(float8 a, float8 b, long8 c);
float16 __ovld __cnfn select(float16 a, float16 b, long16 c);
char __ovld __cnfn select(char a, char b, uchar c);
uchar __ovld __cnfn select(uchar a, uchar b, uchar c);
char2 __ovld __cnfn select(char2 a, char2 b, uchar2 c);
@@ -11610,60 +11454,7 @@ char8 __ovld __cnfn select(char8 a, char8 b, uchar8 c);
uchar8 __ovld __cnfn select(uchar8 a, uchar8 b, uchar8 c);
char16 __ovld __cnfn select(char16 a, char16 b, uchar16 c);
uchar16 __ovld __cnfn select(uchar16 a, uchar16 b, uchar16 c);
short __ovld __cnfn select(short a, short b, uchar c);
ushort __ovld __cnfn select(ushort a, ushort b, uchar c);
short2 __ovld __cnfn select(short2 a, short2 b, uchar2 c);
ushort2 __ovld __cnfn select(ushort2 a, ushort2 b, uchar2 c);
short3 __ovld __cnfn select(short3 a, short3 b, uchar3 c);
ushort3 __ovld __cnfn select(ushort3 a, ushort3 b, uchar3 c);
short4 __ovld __cnfn select(short4 a, short4 b, uchar4 c);
ushort4 __ovld __cnfn select(ushort4 a, ushort4 b, uchar4 c);
short8 __ovld __cnfn select(short8 a, short8 b, uchar8 c);
ushort8 __ovld __cnfn select(ushort8 a, ushort8 b, uchar8 c);
short16 __ovld __cnfn select(short16 a, short16 b, uchar16 c);
ushort16 __ovld __cnfn select(ushort16 a, ushort16 b, uchar16 c);
int __ovld __cnfn select(int a, int b, uchar c);
uint __ovld __cnfn select(uint a, uint b, uchar c);
int2 __ovld __cnfn select(int2 a, int2 b, uchar2 c);
uint2 __ovld __cnfn select(uint2 a, uint2 b, uchar2 c);
int3 __ovld __cnfn select(int3 a, int3 b, uchar3 c);
uint3 __ovld __cnfn select(uint3 a, uint3 b, uchar3 c);
int4 __ovld __cnfn select(int4 a, int4 b, uchar4 c);
uint4 __ovld __cnfn select(uint4 a, uint4 b, uchar4 c);
int8 __ovld __cnfn select(int8 a, int8 b, uchar8 c);
uint8 __ovld __cnfn select(uint8 a, uint8 b, uchar8 c);
int16 __ovld __cnfn select(int16 a, int16 b, uchar16 c);
uint16 __ovld __cnfn select(uint16 a, uint16 b, uchar16 c);
long __ovld __cnfn select(long a, long b, uchar c);
ulong __ovld __cnfn select(ulong a, ulong b, uchar c);
long2 __ovld __cnfn select(long2 a, long2 b, uchar2 c);
ulong2 __ovld __cnfn select(ulong2 a, ulong2 b, uchar2 c);
long3 __ovld __cnfn select(long3 a, long3 b, uchar3 c);
ulong3 __ovld __cnfn select(ulong3 a, ulong3 b, uchar3 c);
long4 __ovld __cnfn select(long4 a, long4 b, uchar4 c);
ulong4 __ovld __cnfn select(ulong4 a, ulong4 b, uchar4 c);
long8 __ovld __cnfn select(long8 a, long8 b, uchar8 c);
ulong8 __ovld __cnfn select(ulong8 a, ulong8 b, uchar8 c);
long16 __ovld __cnfn select(long16 a, long16 b, uchar16 c);
ulong16 __ovld __cnfn select(ulong16 a, ulong16 b, uchar16 c);
float __ovld __cnfn select(float a, float b, uchar c);
float2 __ovld __cnfn select(float2 a, float2 b, uchar2 c);
float3 __ovld __cnfn select(float3 a, float3 b, uchar3 c);
float4 __ovld __cnfn select(float4 a, float4 b, uchar4 c);
float8 __ovld __cnfn select(float8 a, float8 b, uchar8 c);
float16 __ovld __cnfn select(float16 a, float16 b, uchar16 c);
char __ovld __cnfn select(char a, char b, ushort c);
uchar __ovld __cnfn select(uchar a, uchar b, ushort c);
char2 __ovld __cnfn select(char2 a, char2 b, ushort2 c);
uchar2 __ovld __cnfn select(uchar2 a, uchar2 b, ushort2 c);
char3 __ovld __cnfn select(char3 a, char3 b, ushort3 c);
uchar3 __ovld __cnfn select(uchar3 a, uchar3 b, ushort3 c);
char4 __ovld __cnfn select(char4 a, char4 b, ushort4 c);
uchar4 __ovld __cnfn select(uchar4 a, uchar4 b, ushort4 c);
char8 __ovld __cnfn select(char8 a, char8 b, ushort8 c);
uchar8 __ovld __cnfn select(uchar8 a, uchar8 b, ushort8 c);
char16 __ovld __cnfn select(char16 a, char16 b, ushort16 c);
uchar16 __ovld __cnfn select(uchar16 a, uchar16 b, ushort16 c);
short __ovld __cnfn select(short a, short b, ushort c);
ushort __ovld __cnfn select(ushort a, ushort b, ushort c);
short2 __ovld __cnfn select(short2 a, short2 b, ushort2 c);
@@ -11676,60 +11467,7 @@ short8 __ovld __cnfn select(short8 a, short8 b, ushort8 c);
ushort8 __ovld __cnfn select(ushort8 a, ushort8 b, ushort8 c);
short16 __ovld __cnfn select(short16 a, short16 b, ushort16 c);
ushort16 __ovld __cnfn select(ushort16 a, ushort16 b, ushort16 c);
int __ovld __cnfn select(int a, int b, ushort c);
uint __ovld __cnfn select(uint a, uint b, ushort c);
int2 __ovld __cnfn select(int2 a, int2 b, ushort2 c);
uint2 __ovld __cnfn select(uint2 a, uint2 b, ushort2 c);
int3 __ovld __cnfn select(int3 a, int3 b, ushort3 c);
uint3 __ovld __cnfn select(uint3 a, uint3 b, ushort3 c);
int4 __ovld __cnfn select(int4 a, int4 b, ushort4 c);
uint4 __ovld __cnfn select(uint4 a, uint4 b, ushort4 c);
int8 __ovld __cnfn select(int8 a, int8 b, ushort8 c);
uint8 __ovld __cnfn select(uint8 a, uint8 b, ushort8 c);
int16 __ovld __cnfn select(int16 a, int16 b, ushort16 c);
uint16 __ovld __cnfn select(uint16 a, uint16 b, ushort16 c);
long __ovld __cnfn select(long a, long b, ushort c);
ulong __ovld __cnfn select(ulong a, ulong b, ushort c);
long2 __ovld __cnfn select(long2 a, long2 b, ushort2 c);
ulong2 __ovld __cnfn select(ulong2 a, ulong2 b, ushort2 c);
long3 __ovld __cnfn select(long3 a, long3 b, ushort3 c);
ulong3 __ovld __cnfn select(ulong3 a, ulong3 b, ushort3 c);
long4 __ovld __cnfn select(long4 a, long4 b, ushort4 c);
ulong4 __ovld __cnfn select(ulong4 a, ulong4 b, ushort4 c);
long8 __ovld __cnfn select(long8 a, long8 b, ushort8 c);
ulong8 __ovld __cnfn select(ulong8 a, ulong8 b, ushort8 c);
long16 __ovld __cnfn select(long16 a, long16 b, ushort16 c);
ulong16 __ovld __cnfn select(ulong16 a, ulong16 b, ushort16 c);
float __ovld __cnfn select(float a, float b, ushort c);
float2 __ovld __cnfn select(float2 a, float2 b, ushort2 c);
float3 __ovld __cnfn select(float3 a, float3 b, ushort3 c);
float4 __ovld __cnfn select(float4 a, float4 b, ushort4 c);
float8 __ovld __cnfn select(float8 a, float8 b, ushort8 c);
float16 __ovld __cnfn select(float16 a, float16 b, ushort16 c);
char __ovld __cnfn select(char a, char b, uint c);
uchar __ovld __cnfn select(uchar a, uchar b, uint c);
char2 __ovld __cnfn select(char2 a, char2 b, uint2 c);
uchar2 __ovld __cnfn select(uchar2 a, uchar2 b, uint2 c);
char3 __ovld __cnfn select(char3 a, char3 b, uint3 c);
uchar3 __ovld __cnfn select(uchar3 a, uchar3 b, uint3 c);
char4 __ovld __cnfn select(char4 a, char4 b, uint4 c);
uchar4 __ovld __cnfn select(uchar4 a, uchar4 b, uint4 c);
char8 __ovld __cnfn select(char8 a, char8 b, uint8 c);
uchar8 __ovld __cnfn select(uchar8 a, uchar8 b, uint8 c);
char16 __ovld __cnfn select(char16 a, char16 b, uint16 c);
uchar16 __ovld __cnfn select(uchar16 a, uchar16 b, uint16 c);
short __ovld __cnfn select(short a, short b, uint c);
ushort __ovld __cnfn select(ushort a, ushort b, uint c);
short2 __ovld __cnfn select(short2 a, short2 b, uint2 c);
ushort2 __ovld __cnfn select(ushort2 a, ushort2 b, uint2 c);
short3 __ovld __cnfn select(short3 a, short3 b, uint3 c);
ushort3 __ovld __cnfn select(ushort3 a, ushort3 b, uint3 c);
short4 __ovld __cnfn select(short4 a, short4 b, uint4 c);
ushort4 __ovld __cnfn select(ushort4 a, ushort4 b, uint4 c);
short8 __ovld __cnfn select(short8 a, short8 b, uint8 c);
ushort8 __ovld __cnfn select(ushort8 a, ushort8 b, uint8 c);
short16 __ovld __cnfn select(short16 a, short16 b, uint16 c);
ushort16 __ovld __cnfn select(ushort16 a, ushort16 b, uint16 c);
int __ovld __cnfn select(int a, int b, uint c);
uint __ovld __cnfn select(uint a, uint b, uint c);
int2 __ovld __cnfn select(int2 a, int2 b, uint2 c);
@@ -11742,60 +11480,13 @@ int8 __ovld __cnfn select(int8 a, int8 b, uint8 c);
uint8 __ovld __cnfn select(uint8 a, uint8 b, uint8 c);
int16 __ovld __cnfn select(int16 a, int16 b, uint16 c);
uint16 __ovld __cnfn select(uint16 a, uint16 b, uint16 c);
long __ovld __cnfn select(long a, long b, uint c);
ulong __ovld __cnfn select(ulong a, ulong b, uint c);
long2 __ovld __cnfn select(long2 a, long2 b, uint2 c);
ulong2 __ovld __cnfn select(ulong2 a, ulong2 b, uint2 c);
long3 __ovld __cnfn select(long3 a, long3 b, uint3 c);
ulong3 __ovld __cnfn select(ulong3 a, ulong3 b, uint3 c);
long4 __ovld __cnfn select(long4 a, long4 b, uint4 c);
ulong4 __ovld __cnfn select(ulong4 a, ulong4 b, uint4 c);
long8 __ovld __cnfn select(long8 a, long8 b, uint8 c);
ulong8 __ovld __cnfn select(ulong8 a, ulong8 b, uint8 c);
long16 __ovld __cnfn select(long16 a, long16 b, uint16 c);
ulong16 __ovld __cnfn select(ulong16 a, ulong16 b, uint16 c);
float __ovld __cnfn select(float a, float b, uint c);
float2 __ovld __cnfn select(float2 a, float2 b, uint2 c);
float3 __ovld __cnfn select(float3 a, float3 b, uint3 c);
float4 __ovld __cnfn select(float4 a, float4 b, uint4 c);
float8 __ovld __cnfn select(float8 a, float8 b, uint8 c);
float16 __ovld __cnfn select(float16 a, float16 b, uint16 c);
char __ovld __cnfn select(char a, char b, ulong c);
uchar __ovld __cnfn select(uchar a, uchar b, ulong c);
char2 __ovld __cnfn select(char2 a, char2 b, ulong2 c);
uchar2 __ovld __cnfn select(uchar2 a, uchar2 b, ulong2 c);
char3 __ovld __cnfn select(char3 a, char3 b, ulong3 c);
uchar3 __ovld __cnfn select(uchar3 a, uchar3 b, ulong3 c);
char4 __ovld __cnfn select(char4 a, char4 b, ulong4 c);
uchar4 __ovld __cnfn select(uchar4 a, uchar4 b, ulong4 c);
char8 __ovld __cnfn select(char8 a, char8 b, ulong8 c);
uchar8 __ovld __cnfn select(uchar8 a, uchar8 b, ulong8 c);
char16 __ovld __cnfn select(char16 a, char16 b, ulong16 c);
uchar16 __ovld __cnfn select(uchar16 a, uchar16 b, ulong16 c);
short __ovld __cnfn select(short a, short b, ulong c);
ushort __ovld __cnfn select(ushort a, ushort b, ulong c);
short2 __ovld __cnfn select(short2 a, short2 b, ulong2 c);
ushort2 __ovld __cnfn select(ushort2 a, ushort2 b, ulong2 c);
short3 __ovld __cnfn select(short3 a, short3 b, ulong3 c);
ushort3 __ovld __cnfn select(ushort3 a, ushort3 b, ulong3 c);
short4 __ovld __cnfn select(short4 a, short4 b, ulong4 c);
ushort4 __ovld __cnfn select(ushort4 a, ushort4 b, ulong4 c);
short8 __ovld __cnfn select(short8 a, short8 b, ulong8 c);
ushort8 __ovld __cnfn select(ushort8 a, ushort8 b, ulong8 c);
short16 __ovld __cnfn select(short16 a, short16 b, ulong16 c);
ushort16 __ovld __cnfn select(ushort16 a, ushort16 b, ulong16 c);
int __ovld __cnfn select(int a, int b, ulong c);
uint __ovld __cnfn select(uint a, uint b, ulong c);
int2 __ovld __cnfn select(int2 a, int2 b, ulong2 c);
uint2 __ovld __cnfn select(uint2 a, uint2 b, ulong2 c);
int3 __ovld __cnfn select(int3 a, int3 b, ulong3 c);
uint3 __ovld __cnfn select(uint3 a, uint3 b, ulong3 c);
int4 __ovld __cnfn select(int4 a, int4 b, ulong4 c);
uint4 __ovld __cnfn select(uint4 a, uint4 b, ulong4 c);
int8 __ovld __cnfn select(int8 a, int8 b, ulong8 c);
uint8 __ovld __cnfn select(uint8 a, uint8 b, ulong8 c);
int16 __ovld __cnfn select(int16 a, int16 b, ulong16 c);
uint16 __ovld __cnfn select(uint16 a, uint16 b, ulong16 c);
long __ovld __cnfn select(long a, long b, ulong c);
ulong __ovld __cnfn select(ulong a, ulong b, ulong c);
long2 __ovld __cnfn select(long2 a, long2 b, ulong2 c);
@@ -11808,12 +11499,7 @@ long8 __ovld __cnfn select(long8 a, long8 b, ulong8 c);
ulong8 __ovld __cnfn select(ulong8 a, ulong8 b, ulong8 c);
long16 __ovld __cnfn select(long16 a, long16 b, ulong16 c);
ulong16 __ovld __cnfn select(ulong16 a, ulong16 b, ulong16 c);
float __ovld __cnfn select(float a, float b, ulong c);
float2 __ovld __cnfn select(float2 a, float2 b, ulong2 c);
float3 __ovld __cnfn select(float3 a, float3 b, ulong3 c);
float4 __ovld __cnfn select(float4 a, float4 b, ulong4 c);
float8 __ovld __cnfn select(float8 a, float8 b, ulong8 c);
float16 __ovld __cnfn select(float16 a, float16 b, ulong16 c);
#ifdef cl_khr_fp64
double __ovld __cnfn select(double a, double b, long c);
double2 __ovld __cnfn select(double2 a, double2 b, long2 c);
@@ -13141,13 +12827,14 @@ void __ovld __conv barrier(cl_mem_fence_flags flags);
#if __OPENCL_C_VERSION__ >= CL_VERSION_2_0
typedef enum memory_scope
{
memory_scope_work_item,
memory_scope_work_group,
memory_scope_device,
memory_scope_all_svm_devices,
memory_scope_sub_group
typedef enum memory_scope {
memory_scope_work_item = __OPENCL_MEMORY_SCOPE_WORK_ITEM,
memory_scope_work_group = __OPENCL_MEMORY_SCOPE_WORK_GROUP,
memory_scope_device = __OPENCL_MEMORY_SCOPE_DEVICE,
memory_scope_all_svm_devices = __OPENCL_MEMORY_SCOPE_ALL_SVM_DEVICES,
#if defined(cl_intel_subgroups) || defined(cl_khr_subgroups)
memory_scope_sub_group = __OPENCL_MEMORY_SCOPE_SUB_GROUP
#endif
} memory_scope;
void __ovld __conv work_group_barrier(cl_mem_fence_flags flags, memory_scope scope);
@@ -13952,11 +13639,11 @@ unsigned long __ovld atom_xor(volatile __local unsigned long *p, unsigned long v
// enum values aligned with what clang uses in EmitAtomicExpr()
typedef enum memory_order
{
memory_order_relaxed,
memory_order_acquire,
memory_order_release,
memory_order_acq_rel,
memory_order_seq_cst
memory_order_relaxed = __ATOMIC_RELAXED,
memory_order_acquire = __ATOMIC_ACQUIRE,
memory_order_release = __ATOMIC_RELEASE,
memory_order_acq_rel = __ATOMIC_ACQ_REL,
memory_order_seq_cst = __ATOMIC_SEQ_CST
} memory_order;
// double atomics support requires extensions cl_khr_int64_base_atomics and cl_khr_int64_extended_atomics
@@ -16199,6 +15886,313 @@ double __ovld __conv sub_group_scan_inclusive_max(double x);
#endif //cl_khr_subgroups cl_intel_subgroups
#if defined(cl_intel_subgroups)
// Intel-Specific Sub Group Functions
float __ovld __conv intel_sub_group_shuffle( float x, uint c );
float2 __ovld __conv intel_sub_group_shuffle( float2 x, uint c );
float3 __ovld __conv intel_sub_group_shuffle( float3 x, uint c );
float4 __ovld __conv intel_sub_group_shuffle( float4 x, uint c );
float8 __ovld __conv intel_sub_group_shuffle( float8 x, uint c );
float16 __ovld __conv intel_sub_group_shuffle( float16 x, uint c );
int __ovld __conv intel_sub_group_shuffle( int x, uint c );
int2 __ovld __conv intel_sub_group_shuffle( int2 x, uint c );
int3 __ovld __conv intel_sub_group_shuffle( int3 x, uint c );
int4 __ovld __conv intel_sub_group_shuffle( int4 x, uint c );
int8 __ovld __conv intel_sub_group_shuffle( int8 x, uint c );
int16 __ovld __conv intel_sub_group_shuffle( int16 x, uint c );
uint __ovld __conv intel_sub_group_shuffle( uint x, uint c );
uint2 __ovld __conv intel_sub_group_shuffle( uint2 x, uint c );
uint3 __ovld __conv intel_sub_group_shuffle( uint3 x, uint c );
uint4 __ovld __conv intel_sub_group_shuffle( uint4 x, uint c );
uint8 __ovld __conv intel_sub_group_shuffle( uint8 x, uint c );
uint16 __ovld __conv intel_sub_group_shuffle( uint16 x, uint c );
long __ovld __conv intel_sub_group_shuffle( long x, uint c );
ulong __ovld __conv intel_sub_group_shuffle( ulong x, uint c );
float __ovld __conv intel_sub_group_shuffle_down( float cur, float next, uint c );
float2 __ovld __conv intel_sub_group_shuffle_down( float2 cur, float2 next, uint c );
float3 __ovld __conv intel_sub_group_shuffle_down( float3 cur, float3 next, uint c );
float4 __ovld __conv intel_sub_group_shuffle_down( float4 cur, float4 next, uint c );
float8 __ovld __conv intel_sub_group_shuffle_down( float8 cur, float8 next, uint c );
float16 __ovld __conv intel_sub_group_shuffle_down( float16 cur, float16 next, uint c );
int __ovld __conv intel_sub_group_shuffle_down( int cur, int next, uint c );
int2 __ovld __conv intel_sub_group_shuffle_down( int2 cur, int2 next, uint c );
int3 __ovld __conv intel_sub_group_shuffle_down( int3 cur, int3 next, uint c );
int4 __ovld __conv intel_sub_group_shuffle_down( int4 cur, int4 next, uint c );
int8 __ovld __conv intel_sub_group_shuffle_down( int8 cur, int8 next, uint c );
int16 __ovld __conv intel_sub_group_shuffle_down( int16 cur, int16 next, uint c );
uint __ovld __conv intel_sub_group_shuffle_down( uint cur, uint next, uint c );
uint2 __ovld __conv intel_sub_group_shuffle_down( uint2 cur, uint2 next, uint c );
uint3 __ovld __conv intel_sub_group_shuffle_down( uint3 cur, uint3 next, uint c );
uint4 __ovld __conv intel_sub_group_shuffle_down( uint4 cur, uint4 next, uint c );
uint8 __ovld __conv intel_sub_group_shuffle_down( uint8 cur, uint8 next, uint c );
uint16 __ovld __conv intel_sub_group_shuffle_down( uint16 cur, uint16 next, uint c );
long __ovld __conv intel_sub_group_shuffle_down( long prev, long cur, uint c );
ulong __ovld __conv intel_sub_group_shuffle_down( ulong prev, ulong cur, uint c );
float __ovld __conv intel_sub_group_shuffle_up( float prev, float cur, uint c );
float2 __ovld __conv intel_sub_group_shuffle_up( float2 prev, float2 cur, uint c );
float3 __ovld __conv intel_sub_group_shuffle_up( float3 prev, float3 cur, uint c );
float4 __ovld __conv intel_sub_group_shuffle_up( float4 prev, float4 cur, uint c );
float8 __ovld __conv intel_sub_group_shuffle_up( float8 prev, float8 cur, uint c );
float16 __ovld __conv intel_sub_group_shuffle_up( float16 prev, float16 cur, uint c );
int __ovld __conv intel_sub_group_shuffle_up( int prev, int cur, uint c );
int2 __ovld __conv intel_sub_group_shuffle_up( int2 prev, int2 cur, uint c );
int3 __ovld __conv intel_sub_group_shuffle_up( int3 prev, int3 cur, uint c );
int4 __ovld __conv intel_sub_group_shuffle_up( int4 prev, int4 cur, uint c );
int8 __ovld __conv intel_sub_group_shuffle_up( int8 prev, int8 cur, uint c );
int16 __ovld __conv intel_sub_group_shuffle_up( int16 prev, int16 cur, uint c );
uint __ovld __conv intel_sub_group_shuffle_up( uint prev, uint cur, uint c );
uint2 __ovld __conv intel_sub_group_shuffle_up( uint2 prev, uint2 cur, uint c );
uint3 __ovld __conv intel_sub_group_shuffle_up( uint3 prev, uint3 cur, uint c );
uint4 __ovld __conv intel_sub_group_shuffle_up( uint4 prev, uint4 cur, uint c );
uint8 __ovld __conv intel_sub_group_shuffle_up( uint8 prev, uint8 cur, uint c );
uint16 __ovld __conv intel_sub_group_shuffle_up( uint16 prev, uint16 cur, uint c );
long __ovld __conv intel_sub_group_shuffle_up( long prev, long cur, uint c );
ulong __ovld __conv intel_sub_group_shuffle_up( ulong prev, ulong cur, uint c );
float __ovld __conv intel_sub_group_shuffle_xor( float x, uint c );
float2 __ovld __conv intel_sub_group_shuffle_xor( float2 x, uint c );
float3 __ovld __conv intel_sub_group_shuffle_xor( float3 x, uint c );
float4 __ovld __conv intel_sub_group_shuffle_xor( float4 x, uint c );
float8 __ovld __conv intel_sub_group_shuffle_xor( float8 x, uint c );
float16 __ovld __conv intel_sub_group_shuffle_xor( float16 x, uint c );
int __ovld __conv intel_sub_group_shuffle_xor( int x, uint c );
int2 __ovld __conv intel_sub_group_shuffle_xor( int2 x, uint c );
int3 __ovld __conv intel_sub_group_shuffle_xor( int3 x, uint c );
int4 __ovld __conv intel_sub_group_shuffle_xor( int4 x, uint c );
int8 __ovld __conv intel_sub_group_shuffle_xor( int8 x, uint c );
int16 __ovld __conv intel_sub_group_shuffle_xor( int16 x, uint c );
uint __ovld __conv intel_sub_group_shuffle_xor( uint x, uint c );
uint2 __ovld __conv intel_sub_group_shuffle_xor( uint2 x, uint c );
uint3 __ovld __conv intel_sub_group_shuffle_xor( uint3 x, uint c );
uint4 __ovld __conv intel_sub_group_shuffle_xor( uint4 x, uint c );
uint8 __ovld __conv intel_sub_group_shuffle_xor( uint8 x, uint c );
uint16 __ovld __conv intel_sub_group_shuffle_xor( uint16 x, uint c );
long __ovld __conv intel_sub_group_shuffle_xor( long x, uint c );
ulong __ovld __conv intel_sub_group_shuffle_xor( ulong x, uint c );
uint __ovld __conv intel_sub_group_block_read( read_only image2d_t image, int2 coord );
uint2 __ovld __conv intel_sub_group_block_read2( read_only image2d_t image, int2 coord );
uint4 __ovld __conv intel_sub_group_block_read4( read_only image2d_t image, int2 coord );
uint8 __ovld __conv intel_sub_group_block_read8( read_only image2d_t image, int2 coord );
#if (__OPENCL_C_VERSION__ >= CL_VERSION_2_0)
uint __ovld __conv intel_sub_group_block_read(read_write image2d_t image, int2 coord);
uint2 __ovld __conv intel_sub_group_block_read2(read_write image2d_t image, int2 coord);
uint4 __ovld __conv intel_sub_group_block_read4(read_write image2d_t image, int2 coord);
uint8 __ovld __conv intel_sub_group_block_read8(read_write image2d_t image, int2 coord);
#endif // (__OPENCL_C_VERSION__ >= CL_VERSION_2_0)
uint __ovld __conv intel_sub_group_block_read( const __global uint* p );
uint2 __ovld __conv intel_sub_group_block_read2( const __global uint* p );
uint4 __ovld __conv intel_sub_group_block_read4( const __global uint* p );
uint8 __ovld __conv intel_sub_group_block_read8( const __global uint* p );
void __ovld __conv intel_sub_group_block_write(write_only image2d_t image, int2 coord, uint data);
void __ovld __conv intel_sub_group_block_write2(write_only image2d_t image, int2 coord, uint2 data);
void __ovld __conv intel_sub_group_block_write4(write_only image2d_t image, int2 coord, uint4 data);
void __ovld __conv intel_sub_group_block_write8(write_only image2d_t image, int2 coord, uint8 data);
#if (__OPENCL_C_VERSION__ >= CL_VERSION_2_0)
void __ovld __conv intel_sub_group_block_write(read_write image2d_t image, int2 coord, uint data);
void __ovld __conv intel_sub_group_block_write2(read_write image2d_t image, int2 coord, uint2 data);
void __ovld __conv intel_sub_group_block_write4(read_write image2d_t image, int2 coord, uint4 data);
void __ovld __conv intel_sub_group_block_write8(read_write image2d_t image, int2 coord, uint8 data);
#endif // (__OPENCL_C_VERSION__ >= CL_VERSION_2_0)
void __ovld __conv intel_sub_group_block_write( __global uint* p, uint data );
void __ovld __conv intel_sub_group_block_write2( __global uint* p, uint2 data );
void __ovld __conv intel_sub_group_block_write4( __global uint* p, uint4 data );
void __ovld __conv intel_sub_group_block_write8( __global uint* p, uint8 data );
#ifdef cl_khr_fp16
half __ovld __conv intel_sub_group_shuffle( half x, uint c );
half __ovld __conv intel_sub_group_shuffle_down( half prev, half cur, uint c );
half __ovld __conv intel_sub_group_shuffle_up( half prev, half cur, uint c );
half __ovld __conv intel_sub_group_shuffle_xor( half x, uint c );
#endif
#if defined(cl_khr_fp64)
double __ovld __conv intel_sub_group_shuffle( double x, uint c );
double __ovld __conv intel_sub_group_shuffle_down( double prev, double cur, uint c );
double __ovld __conv intel_sub_group_shuffle_up( double prev, double cur, uint c );
double __ovld __conv intel_sub_group_shuffle_xor( double x, uint c );
#endif
#endif //cl_intel_subgroups
#if defined(cl_intel_subgroups_short)
short __ovld __conv intel_sub_group_broadcast( short x, uint sub_group_local_id );
short2 __ovld __conv intel_sub_group_broadcast( short2 x, uint sub_group_local_id );
short3 __ovld __conv intel_sub_group_broadcast( short3 x, uint sub_group_local_id );
short4 __ovld __conv intel_sub_group_broadcast( short4 x, uint sub_group_local_id );
short8 __ovld __conv intel_sub_group_broadcast( short8 x, uint sub_group_local_id );
ushort __ovld __conv intel_sub_group_broadcast( ushort x, uint sub_group_local_id );
ushort2 __ovld __conv intel_sub_group_broadcast( ushort2 x, uint sub_group_local_id );
ushort3 __ovld __conv intel_sub_group_broadcast( ushort3 x, uint sub_group_local_id );
ushort4 __ovld __conv intel_sub_group_broadcast( ushort4 x, uint sub_group_local_id );
ushort8 __ovld __conv intel_sub_group_broadcast( ushort8 x, uint sub_group_local_id );
short __ovld __conv intel_sub_group_shuffle( short x, uint c );
short2 __ovld __conv intel_sub_group_shuffle( short2 x, uint c );
short3 __ovld __conv intel_sub_group_shuffle( short3 x, uint c );
short4 __ovld __conv intel_sub_group_shuffle( short4 x, uint c );
short8 __ovld __conv intel_sub_group_shuffle( short8 x, uint c );
short16 __ovld __conv intel_sub_group_shuffle( short16 x, uint c);
ushort __ovld __conv intel_sub_group_shuffle( ushort x, uint c );
ushort2 __ovld __conv intel_sub_group_shuffle( ushort2 x, uint c );
ushort3 __ovld __conv intel_sub_group_shuffle( ushort3 x, uint c );
ushort4 __ovld __conv intel_sub_group_shuffle( ushort4 x, uint c );
ushort8 __ovld __conv intel_sub_group_shuffle( ushort8 x, uint c );
ushort16 __ovld __conv intel_sub_group_shuffle( ushort16 x, uint c );
short __ovld __conv intel_sub_group_shuffle_down( short cur, short next, uint c );
short2 __ovld __conv intel_sub_group_shuffle_down( short2 cur, short2 next, uint c );
short3 __ovld __conv intel_sub_group_shuffle_down( short3 cur, short3 next, uint c );
short4 __ovld __conv intel_sub_group_shuffle_down( short4 cur, short4 next, uint c );
short8 __ovld __conv intel_sub_group_shuffle_down( short8 cur, short8 next, uint c );
short16 __ovld __conv intel_sub_group_shuffle_down( short16 cur, short16 next, uint c );
ushort __ovld __conv intel_sub_group_shuffle_down( ushort cur, ushort next, uint c );
ushort2 __ovld __conv intel_sub_group_shuffle_down( ushort2 cur, ushort2 next, uint c );
ushort3 __ovld __conv intel_sub_group_shuffle_down( ushort3 cur, ushort3 next, uint c );
ushort4 __ovld __conv intel_sub_group_shuffle_down( ushort4 cur, ushort4 next, uint c );
ushort8 __ovld __conv intel_sub_group_shuffle_down( ushort8 cur, ushort8 next, uint c );
ushort16 __ovld __conv intel_sub_group_shuffle_down( ushort16 cur, ushort16 next, uint c );
short __ovld __conv intel_sub_group_shuffle_up( short cur, short next, uint c );
short2 __ovld __conv intel_sub_group_shuffle_up( short2 cur, short2 next, uint c );
short3 __ovld __conv intel_sub_group_shuffle_up( short3 cur, short3 next, uint c );
short4 __ovld __conv intel_sub_group_shuffle_up( short4 cur, short4 next, uint c );
short8 __ovld __conv intel_sub_group_shuffle_up( short8 cur, short8 next, uint c );
short16 __ovld __conv intel_sub_group_shuffle_up( short16 cur, short16 next, uint c );
ushort __ovld __conv intel_sub_group_shuffle_up( ushort cur, ushort next, uint c );
ushort2 __ovld __conv intel_sub_group_shuffle_up( ushort2 cur, ushort2 next, uint c );
ushort3 __ovld __conv intel_sub_group_shuffle_up( ushort3 cur, ushort3 next, uint c );
ushort4 __ovld __conv intel_sub_group_shuffle_up( ushort4 cur, ushort4 next, uint c );
ushort8 __ovld __conv intel_sub_group_shuffle_up( ushort8 cur, ushort8 next, uint c );
ushort16 __ovld __conv intel_sub_group_shuffle_up( ushort16 cur, ushort16 next, uint c );
short __ovld __conv intel_sub_group_shuffle_xor( short x, uint c );
short2 __ovld __conv intel_sub_group_shuffle_xor( short2 x, uint c );
short3 __ovld __conv intel_sub_group_shuffle_xor( short3 x, uint c );
short4 __ovld __conv intel_sub_group_shuffle_xor( short4 x, uint c );
short8 __ovld __conv intel_sub_group_shuffle_xor( short8 x, uint c );
short16 __ovld __conv intel_sub_group_shuffle_xor( short16 x, uint c );
ushort __ovld __conv intel_sub_group_shuffle_xor( ushort x, uint c );
ushort2 __ovld __conv intel_sub_group_shuffle_xor( ushort2 x, uint c );
ushort3 __ovld __conv intel_sub_group_shuffle_xor( ushort3 x, uint c );
ushort4 __ovld __conv intel_sub_group_shuffle_xor( ushort4 x, uint c );
ushort8 __ovld __conv intel_sub_group_shuffle_xor( ushort8 x, uint c );
ushort16 __ovld __conv intel_sub_group_shuffle_xor( ushort16 x, uint c );
short __ovld __conv intel_sub_group_reduce_add( short x );
ushort __ovld __conv intel_sub_group_reduce_add( ushort x );
short __ovld __conv intel_sub_group_reduce_min( short x );
ushort __ovld __conv intel_sub_group_reduce_min( ushort x );
short __ovld __conv intel_sub_group_reduce_max( short x );
ushort __ovld __conv intel_sub_group_reduce_max( ushort x );
short __ovld __conv intel_sub_group_scan_exclusive_add( short x );
ushort __ovld __conv intel_sub_group_scan_exclusive_add( ushort x );
short __ovld __conv intel_sub_group_scan_exclusive_min( short x );
ushort __ovld __conv intel_sub_group_scan_exclusive_min( ushort x );
short __ovld __conv intel_sub_group_scan_exclusive_max( short x );
ushort __ovld __conv intel_sub_group_scan_exclusive_max( ushort x );
short __ovld __conv intel_sub_group_scan_inclusive_add( short x );
ushort __ovld __conv intel_sub_group_scan_inclusive_add( ushort x );
short __ovld __conv intel_sub_group_scan_inclusive_min( short x );
ushort __ovld __conv intel_sub_group_scan_inclusive_min( ushort x );
short __ovld __conv intel_sub_group_scan_inclusive_max( short x );
ushort __ovld __conv intel_sub_group_scan_inclusive_max( ushort x );
uint __ovld __conv intel_sub_group_block_read_ui( read_only image2d_t image, int2 byte_coord );
uint2 __ovld __conv intel_sub_group_block_read_ui2( read_only image2d_t image, int2 byte_coord );
uint4 __ovld __conv intel_sub_group_block_read_ui4( read_only image2d_t image, int2 byte_coord );
uint8 __ovld __conv intel_sub_group_block_read_ui8( read_only image2d_t image, int2 byte_coord );
#if (__OPENCL_C_VERSION__ >= CL_VERSION_2_0)
uint __ovld __conv intel_sub_group_block_read_ui( read_write image2d_t image, int2 byte_coord );
uint2 __ovld __conv intel_sub_group_block_read_ui2( read_write image2d_t image, int2 byte_coord );
uint4 __ovld __conv intel_sub_group_block_read_ui4( read_write image2d_t image, int2 byte_coord );
uint8 __ovld __conv intel_sub_group_block_read_ui8( read_write image2d_t image, int2 byte_coord );
#endif // (__OPENCL_C_VERSION__ >= CL_VERSION_2_0)
uint __ovld __conv intel_sub_group_block_read_ui( const __global uint* p );
uint2 __ovld __conv intel_sub_group_block_read_ui2( const __global uint* p );
uint4 __ovld __conv intel_sub_group_block_read_ui4( const __global uint* p );
uint8 __ovld __conv intel_sub_group_block_read_ui8( const __global uint* p );
void __ovld __conv intel_sub_group_block_write_ui( read_only image2d_t image, int2 byte_coord, uint data );
void __ovld __conv intel_sub_group_block_write_ui2( read_only image2d_t image, int2 byte_coord, uint2 data );
void __ovld __conv intel_sub_group_block_write_ui4( read_only image2d_t image, int2 byte_coord, uint4 data );
void __ovld __conv intel_sub_group_block_write_ui8( read_only image2d_t image, int2 byte_coord, uint8 data );
#if (__OPENCL_C_VERSION__ >= CL_VERSION_2_0)
void __ovld __conv intel_sub_group_block_write_ui( read_write image2d_t image, int2 byte_coord, uint data );
void __ovld __conv intel_sub_group_block_write_ui2( read_write image2d_t image, int2 byte_coord, uint2 data );
void __ovld __conv intel_sub_group_block_write_ui4( read_write image2d_t image, int2 byte_coord, uint4 data );
void __ovld __conv intel_sub_group_block_write_ui8( read_write image2d_t image, int2 byte_coord, uint8 data );
#endif // (__OPENCL_C_VERSION__ >= CL_VERSION_2_0)
void __ovld __conv intel_sub_group_block_write_ui( __global uint* p, uint data );
void __ovld __conv intel_sub_group_block_write_ui2( __global uint* p, uint2 data );
void __ovld __conv intel_sub_group_block_write_ui4( __global uint* p, uint4 data );
void __ovld __conv intel_sub_group_block_write_ui8( __global uint* p, uint8 data );
ushort __ovld __conv intel_sub_group_block_read_us( read_only image2d_t image, int2 coord );
ushort2 __ovld __conv intel_sub_group_block_read_us2( read_only image2d_t image, int2 coord );
ushort4 __ovld __conv intel_sub_group_block_read_us4( read_only image2d_t image, int2 coord );
ushort8 __ovld __conv intel_sub_group_block_read_us8( read_only image2d_t image, int2 coord );
#if (__OPENCL_C_VERSION__ >= CL_VERSION_2_0)
ushort __ovld __conv intel_sub_group_block_read_us(read_write image2d_t image, int2 coord);
ushort2 __ovld __conv intel_sub_group_block_read_us2(read_write image2d_t image, int2 coord);
ushort4 __ovld __conv intel_sub_group_block_read_us4(read_write image2d_t image, int2 coord);
ushort8 __ovld __conv intel_sub_group_block_read_us8(read_write image2d_t image, int2 coord);
#endif // (__OPENCL_C_VERSION__ >= CL_VERSION_2_0)
ushort __ovld __conv intel_sub_group_block_read_us( const __global ushort* p );
ushort2 __ovld __conv intel_sub_group_block_read_us2( const __global ushort* p );
ushort4 __ovld __conv intel_sub_group_block_read_us4( const __global ushort* p );
ushort8 __ovld __conv intel_sub_group_block_read_us8( const __global ushort* p );
void __ovld __conv intel_sub_group_block_write_us(write_only image2d_t image, int2 coord, ushort data);
void __ovld __conv intel_sub_group_block_write_us2(write_only image2d_t image, int2 coord, ushort2 data);
void __ovld __conv intel_sub_group_block_write_us4(write_only image2d_t image, int2 coord, ushort4 data);
void __ovld __conv intel_sub_group_block_write_us8(write_only image2d_t image, int2 coord, ushort8 data);
#if (__OPENCL_C_VERSION__ >= CL_VERSION_2_0)
void __ovld __conv intel_sub_group_block_write_us(read_write image2d_t image, int2 coord, ushort data);
void __ovld __conv intel_sub_group_block_write_us2(read_write image2d_t image, int2 coord, ushort2 data);
void __ovld __conv intel_sub_group_block_write_us4(read_write image2d_t image, int2 coord, ushort4 data);
void __ovld __conv intel_sub_group_block_write_us8(read_write image2d_t image, int2 coord, ushort8 data);
#endif // (__OPENCL_C_VERSION__ >= CL_VERSION_2_0)
void __ovld __conv intel_sub_group_block_write_us( __global ushort* p, ushort data );
void __ovld __conv intel_sub_group_block_write_us2( __global ushort* p, ushort2 data );
void __ovld __conv intel_sub_group_block_write_us4( __global ushort* p, ushort4 data );
void __ovld __conv intel_sub_group_block_write_us8( __global ushort* p, ushort8 data );
#endif // cl_intel_subgroups_short
#ifdef cl_amd_media_ops
uint __ovld amd_bitalign(uint a, uint b, uint c);
uint2 __ovld amd_bitalign(uint2 a, uint2 b, uint2 c);

View File

@@ -115,8 +115,8 @@ _mm_hsub_ps(__m128 __a, __m128 __b)
return __builtin_ia32_hsubps((__v4sf)__a, (__v4sf)__b);
}
/// \brief Moves and duplicates high-order (odd-indexed) values from a 128-bit
/// vector of [4 x float] to float values stored in a 128-bit vector of
/// \brief Moves and duplicates odd-indexed values from a 128-bit vector
/// of [4 x float] to float values stored in a 128-bit vector of
/// [4 x float].
///
/// \headerfile <x86intrin.h>
@@ -137,7 +137,7 @@ _mm_movehdup_ps(__m128 __a)
return __builtin_shufflevector((__v4sf)__a, (__v4sf)__a, 1, 1, 3, 3);
}
/// \brief Duplicates low-order (even-indexed) values from a 128-bit vector of
/// \brief Duplicates even-indexed values from a 128-bit vector of
/// [4 x float] to float values stored in a 128-bit vector of [4 x float].
///
/// \headerfile <x86intrin.h>

View File

@@ -648,7 +648,7 @@ _mm_mul_epi32 (__m128i __V1, __m128i __V2)
/// input vectors are used as an input for dot product; otherwise that input
/// is treated as zero. Bits [1:0] determine which elements of the result
/// will receive a copy of the final dot product, with bit [0] corresponding
/// to the lowest element and bit [3] corresponding to the highest element of
/// to the lowest element and bit [1] corresponding to the highest element of
/// each [2 x double] vector. If a bit is set, the dot product is returned in
/// the corresponding element; otherwise that element is set to zero.
#define _mm_dp_pd(X, Y, M) __extension__ ({\
@@ -866,8 +866,8 @@ _mm_max_epu32 (__m128i __V1, __m128i __V2)
/// 11: Copies the selected bits from \a Y to result bits [127:96]. \n
/// Bits[3:0]: If any of these bits are set, the corresponding result
/// element is cleared.
/// \returns A 128-bit vector of [4 x float] containing the copied single-
/// precision floating point elements from the operands.
/// \returns A 128-bit vector of [4 x float] containing the copied
/// single-precision floating point elements from the operands.
#define _mm_insert_ps(X, Y, N) __builtin_ia32_insertps128((X), (Y), (N))
/// \brief Extracts a 32-bit integer from a 128-bit vector of [4 x float] and

View File

@@ -26,10 +26,14 @@
#ifndef __STDARG_H
#define __STDARG_H
/* zig: added because macos _va_list.h was duplicately defining va_list
*/
#ifndef _VA_LIST
#ifndef _VA_LIST_T
typedef __builtin_va_list va_list;
#define _VA_LIST
#endif
#endif
#define va_start(ap, param) __builtin_va_start(ap, param)
#define va_end(ap) __builtin_va_end(ap)
#define va_arg(ap, type) __builtin_va_arg(ap, type)

View File

@@ -32,12 +32,15 @@
#define true 1
#define false 0
#elif defined(__GNUC__) && !defined(__STRICT_ANSI__)
/* Define _Bool, bool, false, true as a GNU extension. */
/* Define _Bool as a GNU extension. */
#define _Bool bool
#if __cplusplus < 201103L
/* For C++98, define bool, false, true as a GNU extension. */
#define bool bool
#define false false
#define true true
#endif
#endif
#define __bool_true_false_are_defined 1

View File

@@ -76,7 +76,13 @@ typedef intptr_t _sleb128_t;
typedef uintptr_t _uleb128_t;
struct _Unwind_Context;
#if defined(__arm__) && !(defined(__USING_SJLJ_EXCEPTIONS__) || defined(__ARM_DWARF_EH__))
struct _Unwind_Control_Block;
typedef struct _Unwind_Control_Block _Unwind_Exception; /* Alias */
#else
struct _Unwind_Exception;
typedef struct _Unwind_Exception _Unwind_Exception;
#endif
typedef enum {
_URC_NO_REASON = 0,
#if defined(__arm__) && !defined(__USING_SJLJ_EXCEPTIONS__) && \
@@ -109,8 +115,42 @@ typedef enum {
} _Unwind_Action;
typedef void (*_Unwind_Exception_Cleanup_Fn)(_Unwind_Reason_Code,
struct _Unwind_Exception *);
_Unwind_Exception *);
#if defined(__arm__) && !(defined(__USING_SJLJ_EXCEPTIONS__) || defined(__ARM_DWARF_EH__))
typedef struct _Unwind_Control_Block _Unwind_Control_Block;
typedef uint32_t _Unwind_EHT_Header;
struct _Unwind_Control_Block {
uint64_t exception_class;
void (*exception_cleanup)(_Unwind_Reason_Code, _Unwind_Control_Block *);
/* unwinder cache (private fields for the unwinder's use) */
struct {
uint32_t reserved1; /* forced unwind stop function, 0 if not forced */
uint32_t reserved2; /* personality routine */
uint32_t reserved3; /* callsite */
uint32_t reserved4; /* forced unwind stop argument */
uint32_t reserved5;
} unwinder_cache;
/* propagation barrier cache (valid after phase 1) */
struct {
uint32_t sp;
uint32_t bitpattern[5];
} barrier_cache;
/* cleanup cache (preserved over cleanup) */
struct {
uint32_t bitpattern[4];
} cleanup_cache;
/* personality cache (for personality's benefit) */
struct {
uint32_t fnstart; /* function start address */
_Unwind_EHT_Header *ehtp; /* pointer to EHT entry header word */
uint32_t additional; /* additional data */
uint32_t reserved1;
} pr_cache;
long long int : 0; /* force alignment of next item to 8-byte boundary */
} __attribute__((__aligned__(8)));
#else
struct _Unwind_Exception {
_Unwind_Exception_Class exception_class;
_Unwind_Exception_Cleanup_Fn exception_cleanup;
@@ -120,23 +160,24 @@ struct _Unwind_Exception {
* aligned". GCC has interpreted this to mean "use the maximum useful
* alignment for the target"; so do we. */
} __attribute__((__aligned__));
#endif
typedef _Unwind_Reason_Code (*_Unwind_Stop_Fn)(int, _Unwind_Action,
_Unwind_Exception_Class,
struct _Unwind_Exception *,
_Unwind_Exception *,
struct _Unwind_Context *,
void *);
typedef _Unwind_Reason_Code (*_Unwind_Personality_Fn)(
int, _Unwind_Action, _Unwind_Exception_Class, struct _Unwind_Exception *,
struct _Unwind_Context *);
typedef _Unwind_Reason_Code (*_Unwind_Personality_Fn)(int, _Unwind_Action,
_Unwind_Exception_Class,
_Unwind_Exception *,
struct _Unwind_Context *);
typedef _Unwind_Personality_Fn __personality_routine;
typedef _Unwind_Reason_Code (*_Unwind_Trace_Fn)(struct _Unwind_Context *,
void *);
#if defined(__arm__) && !defined(__APPLE__)
#if defined(__arm__) && !(defined(__USING_SJLJ_EXCEPTIONS__) || defined(__ARM_DWARF_EH__))
typedef enum {
_UVRSC_CORE = 0, /* integer register */
_UVRSC_VFP = 1, /* vfp */
@@ -158,14 +199,12 @@ typedef enum {
_UVRSR_FAILED = 2
} _Unwind_VRS_Result;
#if !defined(__USING_SJLJ_EXCEPTIONS__) && !defined(__ARM_DWARF_EH__)
typedef uint32_t _Unwind_State;
#define _US_VIRTUAL_UNWIND_FRAME ((_Unwind_State)0)
#define _US_UNWIND_FRAME_STARTING ((_Unwind_State)1)
#define _US_UNWIND_FRAME_RESUME ((_Unwind_State)2)
#define _US_ACTION_MASK ((_Unwind_State)3)
#define _US_FORCE_UNWIND ((_Unwind_State)8)
#endif
_Unwind_VRS_Result _Unwind_VRS_Get(struct _Unwind_Context *__context,
_Unwind_VRS_RegClass __regclass,
@@ -224,13 +263,12 @@ _Unwind_Ptr _Unwind_GetRegionStart(struct _Unwind_Context *);
/* DWARF EH functions; currently not available on Darwin/ARM */
#if !defined(__APPLE__) || !defined(__arm__)
_Unwind_Reason_Code _Unwind_RaiseException(struct _Unwind_Exception *);
_Unwind_Reason_Code _Unwind_ForcedUnwind(struct _Unwind_Exception *,
_Unwind_Stop_Fn, void *);
void _Unwind_DeleteException(struct _Unwind_Exception *);
void _Unwind_Resume(struct _Unwind_Exception *);
_Unwind_Reason_Code _Unwind_Resume_or_Rethrow(struct _Unwind_Exception *);
_Unwind_Reason_Code _Unwind_RaiseException(_Unwind_Exception *);
_Unwind_Reason_Code _Unwind_ForcedUnwind(_Unwind_Exception *, _Unwind_Stop_Fn,
void *);
void _Unwind_DeleteException(_Unwind_Exception *);
void _Unwind_Resume(_Unwind_Exception *);
_Unwind_Reason_Code _Unwind_Resume_or_Rethrow(_Unwind_Exception *);
#endif
@@ -241,11 +279,11 @@ typedef struct SjLj_Function_Context *_Unwind_FunctionContext_t;
void _Unwind_SjLj_Register(_Unwind_FunctionContext_t);
void _Unwind_SjLj_Unregister(_Unwind_FunctionContext_t);
_Unwind_Reason_Code _Unwind_SjLj_RaiseException(struct _Unwind_Exception *);
_Unwind_Reason_Code _Unwind_SjLj_ForcedUnwind(struct _Unwind_Exception *,
_Unwind_Reason_Code _Unwind_SjLj_RaiseException(_Unwind_Exception *);
_Unwind_Reason_Code _Unwind_SjLj_ForcedUnwind(_Unwind_Exception *,
_Unwind_Stop_Fn, void *);
void _Unwind_SjLj_Resume(struct _Unwind_Exception *);
_Unwind_Reason_Code _Unwind_SjLj_Resume_or_Rethrow(struct _Unwind_Exception *);
void _Unwind_SjLj_Resume(_Unwind_Exception *);
_Unwind_Reason_Code _Unwind_SjLj_Resume_or_Rethrow(_Unwind_Exception *);
void *_Unwind_FindEnclosingFunction(void *);

98
c_headers/vaesintrin.h Normal file
View File

@@ -0,0 +1,98 @@
/*===------------------ vaesintrin.h - VAES intrinsics ---------------------===
*
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
*
*===-----------------------------------------------------------------------===
*/
#ifndef __IMMINTRIN_H
#error "Never use <vaesintrin.h> directly; include <immintrin.h> instead."
#endif
#ifndef __VAESINTRIN_H
#define __VAESINTRIN_H
/* Default attributes for YMM forms. */
#define __DEFAULT_FN_ATTRS __attribute__((__always_inline__, __nodebug__, __target__("vaes")))
/* Default attributes for ZMM forms. */
#define __DEFAULT_FN_ATTRS_F __attribute__((__always_inline__, __nodebug__, __target__("avx512f,vaes")))
static __inline__ __m256i __DEFAULT_FN_ATTRS
_mm256_aesenc_epi128(__m256i __A, __m256i __B)
{
return (__m256i) __builtin_ia32_aesenc256((__v4di) __A,
(__v4di) __B);
}
static __inline__ __m512i __DEFAULT_FN_ATTRS_F
_mm512_aesenc_epi128(__m512i __A, __m512i __B)
{
return (__m512i) __builtin_ia32_aesenc512((__v8di) __A,
(__v8di) __B);
}
static __inline__ __m256i __DEFAULT_FN_ATTRS
_mm256_aesdec_epi128(__m256i __A, __m256i __B)
{
return (__m256i) __builtin_ia32_aesdec256((__v4di) __A,
(__v4di) __B);
}
static __inline__ __m512i __DEFAULT_FN_ATTRS_F
_mm512_aesdec_epi128(__m512i __A, __m512i __B)
{
return (__m512i) __builtin_ia32_aesdec512((__v8di) __A,
(__v8di) __B);
}
static __inline__ __m256i __DEFAULT_FN_ATTRS
_mm256_aesenclast_epi128(__m256i __A, __m256i __B)
{
return (__m256i) __builtin_ia32_aesenclast256((__v4di) __A,
(__v4di) __B);
}
static __inline__ __m512i __DEFAULT_FN_ATTRS_F
_mm512_aesenclast_epi128(__m512i __A, __m512i __B)
{
return (__m512i) __builtin_ia32_aesenclast512((__v8di) __A,
(__v8di) __B);
}
static __inline__ __m256i __DEFAULT_FN_ATTRS
_mm256_aesdeclast_epi128(__m256i __A, __m256i __B)
{
return (__m256i) __builtin_ia32_aesdeclast256((__v4di) __A,
(__v4di) __B);
}
static __inline__ __m512i __DEFAULT_FN_ATTRS_F
_mm512_aesdeclast_epi128(__m512i __A, __m512i __B)
{
return (__m512i) __builtin_ia32_aesdeclast512((__v8di) __A,
(__v8di) __B);
}
#undef __DEFAULT_FN_ATTRS
#undef __DEFAULT_FN_ATTRS_F
#endif

View File

@@ -0,0 +1,42 @@
/*===------------ vpclmulqdqintrin.h - VPCLMULQDQ intrinsics ---------------===
*
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
*
*===-----------------------------------------------------------------------===
*/
#ifndef __IMMINTRIN_H
#error "Never use <vpclmulqdqintrin.h> directly; include <immintrin.h> instead."
#endif
#ifndef __VPCLMULQDQINTRIN_H
#define __VPCLMULQDQINTRIN_H
#define _mm256_clmulepi64_epi128(A, B, I) __extension__ ({ \
(__m256i)__builtin_ia32_pclmulqdq256((__v4di)(__m256i)(A), \
(__v4di)(__m256i)(B), \
(char)(I)); })
#define _mm512_clmulepi64_epi128(A, B, I) __extension__ ({ \
(__m512i)__builtin_ia32_pclmulqdq512((__v8di)(__m512i)(A), \
(__v8di)(__m512i)(B), \
(char)(I)); })
#endif // __VPCLMULQDQINTRIN_H

View File

@@ -2035,9 +2035,11 @@ _mm_storer_ps(float *__p, __m128 __a)
_mm_store_ps(__p, __a);
}
#define _MM_HINT_T0 3
#define _MM_HINT_T1 2
#define _MM_HINT_T2 1
#define _MM_HINT_ET0 7
#define _MM_HINT_ET1 6
#define _MM_HINT_T0 3
#define _MM_HINT_T1 2
#define _MM_HINT_T2 1
#define _MM_HINT_NTA 0
#ifndef _MSC_VER
@@ -2068,7 +2070,8 @@ _mm_storer_ps(float *__p, __m128 __a)
/// be generated. \n
/// _MM_HINT_T2: Move data using the T2 hint. The PREFETCHT2 instruction will
/// be generated.
#define _mm_prefetch(a, sel) (__builtin_prefetch((void *)(a), 0, (sel)))
#define _mm_prefetch(a, sel) (__builtin_prefetch((void *)(a), \
((sel) >> 2) & 1, (sel) & 0x3))
#endif
/// \brief Stores a 64-bit integer in the specified aligned memory location. To

View File

@@ -8,6 +8,7 @@ SET "RELEASEDIR=zig-%ZIGVERSION%"
mkdir "%RELEASEDIR%"
move build-msvc-release\bin\zig.exe "%RELEASEDIR%"
move build-msvc-release\lib "%RELEASEDIR%"
move zig-cache\langref.html "%RELEASEDIR%"
SET "RELEASEZIP=zig-%ZIGVERSION%.zip"

View File

@@ -6,4 +6,5 @@ build_script:
after_build:
- '%APPVEYOR_BUILD_FOLDER%\ci\appveyor\after_build.bat'
cache:
- 'llvm+clang-5.0.0-win64-msvc-release.tar.xz'
- 'llvm+clang-5.0.1-win64-msvc-release.tar.xz'
- 'llvm+clang-6.0.0-win64-msvc-release.tar.xz'

View File

@@ -7,13 +7,16 @@ SET "PATH=C:\msys64\mingw64\bin;C:\msys64\usr\bin;%PATH%"
SET "MSYSTEM=MINGW64"
SET "APPVEYOR_CACHE_ENTRY_ZIP_ARGS=-m0=Copy"
bash -lc "cd ${APPVEYOR_BUILD_FOLDER} && if [ -s ""llvm+clang-5.0.0-win64-msvc-release.tar.xz"" ]; then echo 'skipping LLVM download'; else wget 'https://s3.amazonaws.com/superjoe/temp/llvm%%2bclang-5.0.0-win64-msvc-release.tar.xz'; fi && tar xf llvm+clang-5.0.0-win64-msvc-release.tar.xz" || exit /b
bash -lc "cd ${APPVEYOR_BUILD_FOLDER} && if [ -s ""llvm+clang-6.0.0-win64-msvc-release.tar.xz"" ]; then echo 'skipping LLVM download'; else wget 'https://s3.amazonaws.com/ziglang.org/deps/llvm%%2bclang-6.0.0-win64-msvc-release.tar.xz'; fi && tar xf llvm+clang-6.0.0-win64-msvc-release.tar.xz" || exit /b
SET "PATH=%PREVPATH%"
SET "MSYSTEM=%PREVMSYSTEM%"
SET "ZIGBUILDDIR=%APPVEYOR_BUILD_FOLDER%\build-msvc-release"
SET "ZIGPREFIXPATH=%APPVEYOR_BUILD_FOLDER%\llvm+clang-5.0.0-win64-msvc-release"
SET "ZIGPREFIXPATH=%APPVEYOR_BUILD_FOLDER%\llvm+clang-6.0.0-win64-msvc-release"
call "C:\Program Files\Microsoft SDKs\Windows\v7.1\Bin\SetEnv.cmd" /x64
call "C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\vcvarsall.bat" x86_amd64
mkdir %ZIGBUILDDIR%
cd %ZIGBUILDDIR%
@@ -22,16 +25,4 @@ msbuild /p:Configuration=Release INSTALL.vcxproj || exit /b
bin\zig.exe build --build-file ..\build.zig test || exit /b
@echo "MSVC build succeeded, proceeding with MinGW build"
cd %APPVEYOR_BUILD_FOLDER%
SET "PATH=C:\msys64\mingw64\bin;C:\msys64\usr\bin;%PATH%"
SET "MSYSTEM=MINGW64"
bash -lc "pacman -Syu --needed --noconfirm"
bash -lc "pacman -Su --needed --noconfirm"
bash -lc "pacman -S --needed --noconfirm make mingw64/mingw-w64-x86_64-make mingw64/mingw-w64-x86_64-cmake mingw64/mingw-w64-x86_64-clang mingw64/mingw-w64-x86_64-llvm mingw64/mingw-w64-x86_64-lld mingw64/mingw-w64-x86_64-gcc"
bash -lc "cd ${APPVEYOR_BUILD_FOLDER} && mkdir build && cd build && cmake .. -G""MSYS Makefiles"" -DCMAKE_INSTALL_PREFIX=$(pwd) -DZIG_LIBC_LIB_DIR=$(dirname $(cc -print-file-name=crt1.o)) -DZIG_LIBC_INCLUDE_DIR=$(echo -n | cc -E -x c - -v 2>&1 | grep -B1 ""End of search list."" | head -n1 | cut -c 2- | sed ""s/ .*//"") -DZIG_LIBC_STATIC_LIB_DIR=$(dirname $(cc -print-file-name=crtbegin.o)) && make && make install"
@echo "MinGW build successful"
@echo "MSVC build succeeded"

View File

@@ -2,7 +2,7 @@
set -x
sudo sh -c 'echo "deb http://apt.llvm.org/trusty/ llvm-toolchain-trusty-5.0 main" >> /etc/apt/sources.list'
sudo sh -c 'echo "deb http://apt.llvm.org/trusty/ llvm-toolchain-trusty-6.0 main" >> /etc/apt/sources.list'
wget -O - http://apt.llvm.org/llvm-snapshot.gpg.key|sudo apt-key add -
sudo add-apt-repository -y ppa:ubuntu-toolchain-r/test
sudo apt-get update -q

View File

@@ -4,4 +4,4 @@ set -x
sudo apt-get remove -y llvm-*
sudo rm -rf /usr/local/*
sudo apt-get install -y clang-5.0 libclang-5.0 libclang-5.0-dev llvm-5.0 llvm-5.0-dev liblld-5.0 liblld-5.0-dev cmake wine1.6-amd64
sudo apt-get install -y clang-6.0 libclang-6.0 libclang-6.0-dev llvm-6.0 llvm-6.0-dev liblld-6.0 liblld-6.0-dev cmake wine1.6-amd64

View File

@@ -3,8 +3,8 @@
set -x
set -e
export CC=clang-5.0
export CXX=clang++-5.0
export CC=clang-6.0
export CXX=clang++-6.0
echo $PATH
mkdir build
cd build
@@ -14,16 +14,16 @@ make install
./zig build --build-file ../build.zig test
./zig test ../test/behavior.zig --target-os windows --target-arch i386 --target-environ msvc
wine test.exe
wine zig-cache/test.exe
./zig test ../test/behavior.zig --target-os windows --target-arch i386 --target-environ msvc --release-fast
wine test.exe
wine zig-cache/test.exe
./zig test ../test/behavior.zig --target-os windows --target-arch i386 --target-environ msvc --release-safe
wine test.exe
wine zig-cache/test.exe
./zig test ../test/behavior.zig --target-os windows --target-arch x86_64 --target-environ msvc
wine64 test.exe
wine64 zig-cache/test.exe
#./zig test ../test/behavior.zig --target-os windows --target-arch x86_64 --target-environ msvc --release-fast
#wine64 test.exe

View File

@@ -2,6 +2,6 @@
set -x
brew install llvm@5
brew outdated llvm@5 || brew upgrade llvm@5
brew install llvm@6
brew outdated llvm@6 || brew upgrade llvm@6

View File

@@ -5,21 +5,8 @@ set -e
mkdir build
cd build
cmake .. -DCMAKE_PREFIX_PATH=/usr/local/opt/llvm@5/ -DCMAKE_INSTALL_PREFIX=$(pwd)
cmake .. -DCMAKE_PREFIX_PATH=/usr/local/opt/llvm@6/ -DCMAKE_INSTALL_PREFIX=$(pwd)
make VERBOSE=1
make install
# TODO: we run the tests separately because when run all together there is some
# mysterious issue where after N child process spawns it crashes. I've been
# unable to reproduce the issue on my macbook - it only happens on Travis.
# ./zig build --build-file ../build.zig test
./zig build --build-file ../build.zig test-behavior --verbose
./zig build --build-file ../build.zig test-std --verbose
./zig build --build-file ../build.zig test-compiler-rt --verbose
./zig build --build-file ../build.zig test-compare-output --verbose
./zig build --build-file ../build.zig test-build-examples --verbose
./zig build --build-file ../build.zig test-compile-errors --verbose
./zig build --build-file ../build.zig test-asm-link --verbose
./zig build --build-file ../build.zig test-debug-safety --verbose
./zig build --build-file ../build.zig test-parsec --verbose
./zig build --build-file ../build.zig test

View File

@@ -26,16 +26,16 @@ if(MSVC)
else()
find_path(CLANG_INCLUDE_DIRS NAMES clang/Frontend/ASTUnit.h
PATHS
/usr/lib/llvm/5/include
/usr/lib/llvm-5.0/include
/usr/lib/llvm/6/include
/usr/lib/llvm-6.0/include
/mingw64/include)
macro(FIND_AND_ADD_CLANG_LIB _libname_)
string(TOUPPER ${_libname_} _prettylibname_)
find_library(CLANG_${_prettylibname_}_LIB NAMES ${_libname_}
PATHS
/usr/lib/llvm/5/lib
/usr/lib/llvm-5.0/lib
/usr/lib/llvm/6/lib
/usr/lib/llvm-6.0/lib
/mingw64/lib
/c/msys64/mingw64/lib
c:\\msys64\\mingw64\\lib)

View File

@@ -6,12 +6,12 @@
# LLD_INCLUDE_DIRS
# LLD_LIBRARIES
find_path(LLD_INCLUDE_DIRS NAMES lld/Driver/Driver.h
find_path(LLD_INCLUDE_DIRS NAMES lld/Common/Driver.h
PATHS
/usr/lib/llvm-5.0/include
/usr/lib/llvm-6.0/include
/mingw64/include)
find_library(LLD_LIBRARY NAMES lld-5.0 lld PATHS /usr/lib/llvm-5.0/lib)
find_library(LLD_LIBRARY NAMES lld-6.0 lld PATHS /usr/lib/llvm-6.0/lib)
if(EXISTS ${LLD_LIBRARY})
set(LLD_LIBRARIES ${LLD_LIBRARY})
else()
@@ -19,7 +19,7 @@ else()
string(TOUPPER ${_libname_} _prettylibname_)
find_library(LLD_${_prettylibname_}_LIB NAMES ${_libname_}
PATHS
/usr/lib/llvm-5.0/lib
/usr/lib/llvm-6.0/lib
/mingw64/lib
/c/msys64/mingw64/lib
c:/msys64/mingw64/lib)
@@ -29,13 +29,14 @@ else()
endmacro(FIND_AND_ADD_LLD_LIB)
FIND_AND_ADD_LLD_LIB(lldDriver)
FIND_AND_ADD_LLD_LIB(lldMinGW)
FIND_AND_ADD_LLD_LIB(lldELF)
FIND_AND_ADD_LLD_LIB(lldCOFF)
FIND_AND_ADD_LLD_LIB(lldMachO)
FIND_AND_ADD_LLD_LIB(lldReaderWriter)
FIND_AND_ADD_LLD_LIB(lldCore)
FIND_AND_ADD_LLD_LIB(lldYAML)
FIND_AND_ADD_LLD_LIB(lldConfig)
FIND_AND_ADD_LLD_LIB(lldCommon)
endif()
include(FindPackageHandleStandardArgs)

View File

@@ -8,12 +8,12 @@
# LLVM_LIBDIRS
find_program(LLVM_CONFIG_EXE
NAMES llvm-config-5.0 llvm-config
NAMES llvm-config-6.0 llvm-config
PATHS
"/mingw64/bin"
"/c/msys64/mingw64/bin"
"c:/msys64/mingw64/bin"
"C:/Libraries/llvm-5.0.0/bin")
"C:/Libraries/llvm-6.0.0/bin")
if(NOT(CMAKE_BUILD_TYPE STREQUAL "Debug"))
execute_process(
@@ -62,7 +62,7 @@ execute_process(
set(LLVM_LIBRARIES ${LLVM_LIBRARIES} ${LLVM_SYSTEM_LIBS})
if(NOT LLVM_LIBRARIES)
find_library(LLVM_LIBRARIES NAMES LLVM LLVM-5.0 LLVM-5)
find_library(LLVM_LIBRARIES NAMES LLVM LLVM-6.0 LLVM-6)
endif()
link_directories("${CMAKE_PREFIX_PATH}/lib")

View File

@@ -1,13 +1,13 @@
License for Berkeley SoftFloat Release 3d
License for Berkeley SoftFloat Release 3e
John R. Hauser
2017 August 10
2018 January 20
The following applies to the whole of SoftFloat Release 3d as well as to
The following applies to the whole of SoftFloat Release 3e as well as to
each source file individually.
Copyright 2011, 2012, 2013, 2014, 2015, 2016, 2017 The Regents of the
Copyright 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018 The Regents of the
University of California. All rights reserved.
Redistribution and use in source and binary forms, with or without

View File

@@ -7,11 +7,11 @@
<BODY>
<H1>Package Overview for Berkeley SoftFloat Release 3d</H1>
<H1>Package Overview for Berkeley SoftFloat Release 3e</H1>
<P>
John R. Hauser<BR>
2017 August 10<BR>
2018 January 20<BR>
</P>
<P>

View File

@@ -1,8 +1,8 @@
Package Overview for Berkeley SoftFloat Release 3d
Package Overview for Berkeley SoftFloat Release 3e
John R. Hauser
2017 August 10
2018 January 20
Berkeley SoftFloat is a software implementation of binary floating-point
that conforms to the IEEE Standard for Floating-Point Arithmetic. SoftFloat

View File

@@ -7,14 +7,57 @@
<BODY>
<H1>History of Berkeley SoftFloat, to Release 3d</H1>
<H1>History of Berkeley SoftFloat, to Release 3e</H1>
<P>
John R. Hauser<BR>
2017 August 10<BR>
2018 January 20<BR>
</P>
<H3>Release 3e (2018 January)</H3>
<UL>
<LI>
Changed the default numeric code for optional rounding mode <CODE>odd</CODE>
(round to odd, also known as <EM>jamming</EM>) from 5 to 6.
<LI>
Modified the behavior of rounding mode <CODE>odd</CODE> when rounding to an
integer value (either conversion to an integer format or a
&lsquo;<CODE>roundToInt</CODE>&rsquo; function).
Previously, for those cases only, rounding mode <CODE>odd</CODE> acted the same
as rounding to minimum magnitude.
Now all operations are rounded consistently.
<LI>
Fixed some errors in the specialization code modeling Intel x86 floating-point,
specifically the integers returned on invalid operations and the propagation of
NaN payloads in a few rare cases.
<LI>
Added specialization code modeling ARM floating-point, conforming to VFPv2 or
later.
<LI>
Added an example target for ARM processors.
<LI>
Fixed a minor bug whereby function <CODE>f16_to_ui64</CODE> might return a
different integer than expected in the case that the floating-point operand is
negative.
<LI>
Added example target-specific optimization for GCC, employing GCC instrinsics
and support for <NOBR>128-bit</NOBR> integer arithmetic.
<LI>
Made other minor improvements.
</UL>
<H3>Release 3d (2017 August)</H3>
<UL>

View File

@@ -7,11 +7,11 @@
<BODY>
<H1>Berkeley SoftFloat Release 3d: Source Documentation</H1>
<H1>Berkeley SoftFloat Release 3e: Source Documentation</H1>
<P>
John R. Hauser<BR>
2017 August 10<BR>
2018 January 20<BR>
</P>
@@ -69,7 +69,7 @@ SoftFloat has been successfully compiled with the GNU C Compiler
<NOBR>Release 2</NOBR> or earlier.
Changes to the interface of SoftFloat functions are documented in
<A HREF="SoftFloat.html"><NOBR><CODE>SoftFloat.html</CODE></NOBR></A>.
The current version of SoftFloat is <NOBR>Release 3d</NOBR>.
The current version of SoftFloat is <NOBR>Release 3e</NOBR>.
</P>
@@ -114,7 +114,7 @@ and <CODE>&lt;stdint.h&gt;</CODE></I>.
The SoftFloat package was written by me, <NOBR>John R.</NOBR> Hauser.
<NOBR>Release 3</NOBR> of SoftFloat was a completely new implementation
supplanting earlier releases.
The project to create <NOBR>Release 3</NOBR> (now <NOBR>through 3d</NOBR>) was
The project to create <NOBR>Release 3</NOBR> (now <NOBR>through 3e</NOBR>) was
done in the employ of the University of California, Berkeley, within the
Department of Electrical Engineering and Computer Sciences, first for the
Parallel Computing Laboratory (Par Lab) and then for the ASPIRE Lab.
@@ -148,12 +148,12 @@ Oracle, and Samsung.
</P>
<P>
The following applies to the whole of SoftFloat <NOBR>Release 3d</NOBR> as well
The following applies to the whole of SoftFloat <NOBR>Release 3e</NOBR> as well
as to each source file individually.
</P>
<P>
Copyright 2011, 2012, 2013, 2014, 2015, 2016, 2017 The Regents of the
Copyright 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018 The Regents of the
University of California.
All rights reserved.
</P>
@@ -215,12 +215,15 @@ source
include
8086
8086-SSE
ARM-VFPv2
ARM-VFPv2-defaultNaN
build
template-FAST_INT64
template-not-FAST_INT64
Linux-386-GCC
Linux-386-SSE2-GCC
Linux-x86_64-GCC
Linux-ARM-VFPv2-GCC
Win32-MinGW
Win32-SSE2-MinGW
Win64-MinGW-w64
@@ -228,20 +231,37 @@ build
</BLOCKQUOTE>
The majority of the SoftFloat sources are provided in the <CODE>source</CODE>
directory.
The <CODE>include</CODE> subdirectory of <CODE>source</CODE> contains several
header files (unsurprisingly), while the <CODE>8086</CODE> and
<NOBR><CODE>8086-SSE</CODE></NOBR> subdirectories contain source files that
specialize the floating-point behavior to match the Intel x86 line of
processors.
The files in directory <CODE>8086</CODE> give floating-point behavior
consistent solely with Intel&rsquo;s older, 8087-derived floating-point, while
those in <NOBR><CODE>8086-SSE</CODE></NOBR> update the behavior of the
non-extended formats (<CODE>float16_t</CODE>, <CODE>float32_t</CODE>,
<CODE>float64_t</CODE>, and <CODE>float128_t</CODE>) to mirror Intel&rsquo;s
more recent Streaming SIMD Extensions (SSE) and other compatible extensions.
The <CODE>include</CODE> subdirectory contains several header files
(unsurprisingly), while the other subdirectories of <CODE>source</CODE> contain
source files that specialize the floating-point behavior to match particular
processor families:
<BLOCKQUOTE>
<DL>
<DT><CODE>8086</CODE></DT>
<DD>
Intel&rsquo;s older, 8087-derived floating-point, extended to all supported
floating-point types
</DD>
<DT><CODE>8086-SSE</CODE></DT>
<DD>
Intel&rsquo;s x86 processors with Streaming SIMD Extensions (SSE) and later
compatible extensions, having 8087 behavior for <NOBR>80-bit</NOBR>
double-extended-precision (<CODE>extFloat80_t</CODE>) and SSE behavior for
other floating-point types
</DD>
<DT><CODE>ARM-VFPv2</CODE></DT>
<DD>
ARM&rsquo;s VFPv2 or later floating-point, with NaN payload propagation
</DD>
<DT><CODE>ARM-VFPv2-defaultNaN</CODE></DT>
<DD>
ARM&rsquo;s VFPv2 or later floating-point, with the &ldquo;default NaN&rdquo;
option
</DD>
</DL>
</BLOCKQUOTE>
If other specializations are attempted, these would be expected to be other
subdirectories of <CODE>source</CODE> alongside <CODE>8086</CODE> and
<NOBR><CODE>8086-SSE</CODE></NOBR>.
subdirectories of <CODE>source</CODE> alongside the ones listed above.
Specialization is covered later, in <NOBR>section 5.2</NOBR>, <I>Specializing
Floating-Point Behavior</I>.
</P>
@@ -264,19 +284,20 @@ are intended to follow a naming system of
For the example targets,
<NOBR><CODE>&lt;<I>execution-environment</I>&gt;</CODE></NOBR> is
<NOBR><CODE>Linux-386</CODE></NOBR>, <NOBR><CODE>Linux-386-SSE2</CODE></NOBR>,
<NOBR><CODE>Linux-x86_64</CODE></NOBR>, <CODE>Win32</CODE>,
<NOBR><CODE>Linux-x86_64</CODE></NOBR>,
<NOBR><CODE>Linux-ARM-VFPv2</CODE></NOBR>, <CODE>Win32</CODE>,
<NOBR><CODE>Win32-SSE2</CODE></NOBR>, or <CODE>Win64</CODE>, and
<NOBR><CODE>&lt;<I>compiler</I>&gt;</CODE></NOBR> is <CODE>GCC</CODE>,
<CODE>MinGW</CODE>, or <NOBR><CODE>MinGW-w64</CODE></NOBR>.
</P>
<P>
At the current time, all of the supplied target directories are merely examples
that may or may not be correct for compiling on any particular system.
All of the supplied target directories are merely examples that may or may not
be correct for compiling on any particular system.
Despite requests, there are currently no plans to include and maintain in the
SoftFloat package the build files needed for a great many users&rsquo;
compilation environments, which after all can span a broad range of operating
systems, compilers, and other tools.
compilation environments, which can span a huge range of operating systems,
compilers, and other tools.
</P>
<P>
@@ -402,8 +423,8 @@ A new build target may use an existing specialization, such as the ones
provided by the <CODE>8086</CODE> and <NOBR><CODE>8086-SSE</CODE></NOBR>
subdirectories.
If a build target needs a new specialization, different from any existing ones,
it is recommended that a new specialization subdirectory be created in the
<CODE>source</CODE> directory for this purpose.
it is recommended that a new specialization directory be created for this
purpose.
The <CODE>specialize.h</CODE> header file from any of the provided
specialization subdirectories can be used as a model for what definitions are
needed.
@@ -577,8 +598,40 @@ function.
This technically defines <NOBR><CODE>&lt;<I>function-name</I>&gt;</CODE></NOBR>
as a macro, but one that resolves to the same name, which may then be a
function.
(A preprocessor that conforms to the C Standard must limit recursive macro
expansion from being applied more than once.)
(A preprocessor that conforms to the C Standard is required to limit recursive
macro expansion from being applied more than once.)
</P>
<P>
The supplied header file <CODE>opts-GCC.h</CODE> (in directory
<CODE>source/include</CODE>) provides an example of target-specific
optimization for the GCC compiler.
Each GCC target example in the <CODE>build</CODE> directory has
<BLOCKQUOTE>
<CODE>#include "opts-GCC.h"</CODE>
</BLOCKQUOTE>
in its <CODE>platform.h</CODE> header file.
Before <CODE>opts-GCC.h</CODE> is included, the following macros must be
defined (or not) to control which features are invoked:
<BLOCKQUOTE>
<DL>
<DT><CODE>SOFTFLOAT_BUILTIN_CLZ</CODE></DT>
<DD>
If defined, SoftFloat&rsquo;s internal
&lsquo;<CODE>countLeadingZeros</CODE>&rsquo; functions use intrinsics
<CODE>__builtin_clz</CODE> and <CODE>__builtin_clzll</CODE>.
</DD>
<DT><CODE>SOFTFLOAT_INTRINSIC_INT128</CODE></DT>
<DD>
If defined, SoftFloat makes use of GCC&rsquo;s nonstandard <NOBR>128-bit</NOBR>
integer type <CODE>__int128</CODE>.
</DD>
</DL>
</BLOCKQUOTE>
On some machines, these improvements are observed to increase the speeds of
<CODE>f64_mul</CODE> and <CODE>f128_mul</CODE> by around 20 to 25%, although
other functions receive less dramatic boosts, or none at all.
Results can vary greatly across different platforms.
</P>

View File

@@ -7,11 +7,11 @@
<BODY>
<H1>Berkeley SoftFloat Release 3d: Library Interface</H1>
<H1>Berkeley SoftFloat Release 3e: Library Interface</H1>
<P>
John R. Hauser<BR>
2017 August 10<BR>
2018 January 20<BR>
</P>
@@ -106,13 +106,20 @@ Information about the standard is available elsewhere.
</P>
<P>
The current version of SoftFloat is <NOBR>Release 3d</NOBR>.
This release fixes bugs that were found in the square root functions for the
<NOBR>64-bit</NOBR>, <NOBR>80-bit</NOBR>, and <NOBR>128-bit</NOBR>
floating-point formats.
The current version of SoftFloat is <NOBR>Release 3e</NOBR>.
This release modifies the behavior of the rarely used <I>odd</I> rounding mode
(<I>round to odd</I>, also known as <I>jamming</I>), and also adds some new
specialization and optimization examples for those compiling SoftFloat.
</P>
<P>
The previous <NOBR>Release 3d</NOBR> fixed bugs that were found in the square
root functions for the <NOBR>64-bit</NOBR>, <NOBR>80-bit</NOBR>, and
<NOBR>128-bit</NOBR> floating-point formats.
(Thanks to Alexei Sibidanov at the University of Victoria for reporting an
incorrect result.)
The bugs affected all prior <NOBR>Release-3</NOBR> versions of SoftFloat.
The bugs affected all prior <NOBR>Release-3</NOBR> versions of SoftFloat
<NOBR>through 3c</NOBR>.
The flaw in the <NOBR>64-bit</NOBR> floating-point square root function was of
very minor impact, causing a <NOBR>1-ulp</NOBR> error (<NOBR>1 unit</NOBR> in
the last place) a few times out of a billion.
@@ -124,13 +131,8 @@ wrong.
</P>
<P>
<NOBR>Release 3d</NOBR> makes no changes to the SoftFloat library interface
compared to the previous <NOBR>Release 3c</NOBR>.
Since the original <NOBR>Release 3</NOBR>, the main changes to the interface
have been that <NOBR>Release 3b</NOBR> added support for the
<NOBR>16-bit</NOBR> half-precision format, and <NOBR>Release 3c</NOBR> added
optional support for a rarely used rounding mode, <I>round to odd</I>, also
known as <I>jamming</I>.
Among earlier releases, 3b was notable for adding support for the
<NOBR>16-bit</NOBR> half-precision format.
For more about the evolution of SoftFloat releases, see
<A HREF="SoftFloat-history.html"><NOBR><CODE>SoftFloat-history.html</CODE></NOBR></A>.
</P>
@@ -169,7 +171,7 @@ strictly required.
<P>
Most operations not required by the original 1985 version of the IEEE
Floating-Point Standard but added in the 2008 version are not yet supported in
SoftFloat <NOBR>Release 3d</NOBR>.
SoftFloat <NOBR>Release 3e</NOBR>.
</P>
@@ -179,7 +181,7 @@ SoftFloat <NOBR>Release 3d</NOBR>.
The SoftFloat package was written by me, <NOBR>John R.</NOBR> Hauser.
<NOBR>Release 3</NOBR> of SoftFloat was a completely new implementation
supplanting earlier releases.
The project to create <NOBR>Release 3</NOBR> (now <NOBR>through 3d</NOBR>) was
The project to create <NOBR>Release 3</NOBR> (now <NOBR>through 3e</NOBR>) was
done in the employ of the University of California, Berkeley, within the
Department of Electrical Engineering and Computer Sciences, first for the
Parallel Computing Laboratory (Par Lab) and then for the ASPIRE Lab.
@@ -213,12 +215,12 @@ Oracle, and Samsung.
</P>
<P>
The following applies to the whole of SoftFloat <NOBR>Release 3d</NOBR> as well
The following applies to the whole of SoftFloat <NOBR>Release 3e</NOBR> as well
as to each source file individually.
</P>
<P>
Copyright 2011, 2012, 2013, 2014, 2015, 2016, 2017 The Regents of the
Copyright 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018 The Regents of the
University of California.
All rights reserved.
</P>
@@ -395,7 +397,7 @@ comparisons between two values in the same floating-point format.
<P>
The following operations required by the 2008 IEEE Floating-Point Standard are
not supported in SoftFloat <NOBR>Release 3d</NOBR>:
not supported in SoftFloat <NOBR>Release 3e</NOBR>:
<UL>
<LI>
<B>nextUp</B>, <B>nextDown</B>, <B>minNum</B>, <B>maxNum</B>, <B>minNumMag</B>,
@@ -445,8 +447,8 @@ exponent must both be zero.
</P>
<P>
SoftFloat's functions are not guaranteed to operate as expected when inputs of
type <CODE>extFloat80_t</CODE> are non-canonical.
SoftFloat&rsquo;s functions are not guaranteed to operate as expected when
inputs of type <CODE>extFloat80_t</CODE> are non-canonical.
Assuming all of a function&rsquo;s <CODE>extFloat80_t</CODE> inputs (if any)
are canonical, function outputs of type <CODE>extFloat80_t</CODE> will always
be canonical.
@@ -591,16 +593,15 @@ Variable <CODE>softfloat_roundingMode</CODE> is initialized to
</P>
<P>
If supported, mode <CODE>softfloat_round_odd</CODE> first rounds a
floating-point result to minimum magnitude, the same as
When <CODE>softfloat_round_odd</CODE> is the rounding mode for a function that
rounds to an integer value (either conversion to an integer format or a
&lsquo;<CODE>roundToInt</CODE>&rsquo; function), if the input is not already an
integer, the rounded result is the closest <EM>odd</EM> integer.
For other operations, this rounding mode acts as though the floating-point
result is first rounded to minimum magnitude, the same as
<CODE>softfloat_round_minMag</CODE>, and then, if the result is inexact, the
least-significant bit of the result is set <NOBR>to 1</NOBR>.
This rounding mode is also known as <EM>jamming</EM>.
As a special case, when <CODE>softfloat_round_odd</CODE> is the rounding mode
for a function that rounds to an integer value (either conversion to an integer
format or a &lsquo;<CODE>roundToInt</CODE>&rsquo; function), rounding is the
same as <CODE>softfloat_round_minMag</CODE>, without any change to the
least-significant integer bit.
Rounding to odd is also known as <EM>jamming</EM>.
</P>
<H3>6.2. Underflow Detection</H3>
@@ -820,12 +821,6 @@ The <CODE><I>roundingMode</I></CODE> argument specifies the rounding mode for
the conversion.
The variable that usually indicates rounding mode,
<CODE>softfloat_roundingMode</CODE>, is ignored.
If <CODE><I>roundingMode</I></CODE> is <CODE>softfloat_round_odd</CODE>,
rounding is to minimum magnitude, the same as
<CODE>softfloat_round_minMag</CODE>, rather than to an odd integer.
</P>
<P>
Argument <CODE><I>exact</I></CODE> determines whether the <I>inexact</I>
exception flag is raised if the conversion is not exact.
If <CODE><I>exact</I></CODE> is <CODE>true</CODE>, the <I>inexact</I> flag may
@@ -1087,12 +1082,6 @@ The <CODE><I>roundingMode</I></CODE> argument specifies the rounding mode to
apply.
The variable that usually indicates rounding mode,
<CODE>softfloat_roundingMode</CODE>, is ignored.
If <CODE><I>roundingMode</I></CODE> is <CODE>softfloat_round_odd</CODE>,
rounding is to minimum magnitude, the same as
<CODE>softfloat_round_minMag</CODE>, rather than to an odd integer value.
</P>
<P>
Argument <CODE><I>exact</I></CODE> determines whether the <I>inexact</I>
exception flag is raised if the conversion is not exact.
If <CODE><I>exact</I></CODE> is <CODE>true</CODE>, the <I>inexact</I> flag may

View File

@@ -2,7 +2,7 @@
/*============================================================================
This C source file is part of the SoftFloat IEEE Floating-Point Arithmetic
Package, Release 3d, by John R. Hauser.
Package, Release 3e, by John R. Hauser.
Copyright 2011, 2012, 2013, 2014 The Regents of the University of California.
All rights reserved.

View File

@@ -2,7 +2,7 @@
/*============================================================================
This C source file is part of the SoftFloat IEEE Floating-Point Arithmetic
Package, Release 3d, by John R. Hauser.
Package, Release 3e, by John R. Hauser.
Copyright 2011, 2012, 2013, 2014 The Regents of the University of California.
All rights reserved.

View File

@@ -2,7 +2,7 @@
/*============================================================================
This C source file is part of the SoftFloat IEEE Floating-Point Arithmetic
Package, Release 3d, by John R. Hauser.
Package, Release 3e, by John R. Hauser.
Copyright 2011, 2012, 2013, 2014 The Regents of the University of California.
All rights reserved.

View File

@@ -2,7 +2,7 @@
/*============================================================================
This C source file is part of the SoftFloat IEEE Floating-Point Arithmetic
Package, Release 3d, by John R. Hauser.
Package, Release 3e, by John R. Hauser.
Copyright 2011, 2012, 2013, 2014 The Regents of the University of California.
All rights reserved.

View File

@@ -2,7 +2,7 @@
/*============================================================================
This C source file is part of the SoftFloat IEEE Floating-Point Arithmetic
Package, Release 3d, by John R. Hauser.
Package, Release 3e, by John R. Hauser.
Copyright 2011, 2012, 2013, 2014 The Regents of the University of California.
All rights reserved.

View File

@@ -2,7 +2,7 @@
/*============================================================================
This C source file is part of the SoftFloat IEEE Floating-Point Arithmetic
Package, Release 3d, by John R. Hauser.
Package, Release 3e, by John R. Hauser.
Copyright 2011, 2012, 2013, 2014 The Regents of the University of California.
All rights reserved.

View File

@@ -2,7 +2,7 @@
/*============================================================================
This C source file is part of the SoftFloat IEEE Floating-Point Arithmetic
Package, Release 3d, by John R. Hauser.
Package, Release 3e, by John R. Hauser.
Copyright 2011, 2012, 2013, 2014, 2015 The Regents of the University of
California. All rights reserved.

View File

@@ -2,7 +2,7 @@
/*============================================================================
This C source file is part of the SoftFloat IEEE Floating-Point Arithmetic
Package, Release 3d, by John R. Hauser.
Package, Release 3e, by John R. Hauser.
Copyright 2011, 2012, 2013, 2014 The Regents of the University of California.
All rights reserved.

View File

@@ -2,7 +2,7 @@
/*============================================================================
This C source file is part of the SoftFloat IEEE Floating-Point Arithmetic
Package, Release 3d, by John R. Hauser.
Package, Release 3e, by John R. Hauser.
Copyright 2011, 2012, 2013, 2014 The Regents of the University of California.
All rights reserved.

View File

@@ -2,7 +2,7 @@
/*============================================================================
This C source file is part of the SoftFloat IEEE Floating-Point Arithmetic
Package, Release 3d, by John R. Hauser.
Package, Release 3e, by John R. Hauser.
Copyright 2011, 2012, 2013, 2014 The Regents of the University of California.
All rights reserved.

View File

@@ -2,7 +2,7 @@
/*============================================================================
This C source file is part of the SoftFloat IEEE Floating-Point Arithmetic
Package, Release 3d, by John R. Hauser.
Package, Release 3e, by John R. Hauser.
Copyright 2011, 2012, 2013, 2014 The Regents of the University of California.
All rights reserved.

View File

@@ -2,7 +2,7 @@
/*============================================================================
This C source file is part of the SoftFloat IEEE Floating-Point Arithmetic
Package, Release 3d, by John R. Hauser.
Package, Release 3e, by John R. Hauser.
Copyright 2011, 2012, 2013, 2014 The Regents of the University of California.
All rights reserved.

View File

@@ -2,7 +2,7 @@
/*============================================================================
This C source file is part of the SoftFloat IEEE Floating-Point Arithmetic
Package, Release 3d, by John R. Hauser.
Package, Release 3e, by John R. Hauser.
Copyright 2011, 2012, 2013, 2014 The Regents of the University of California.
All rights reserved.

View File

@@ -2,7 +2,7 @@
/*============================================================================
This C source file is part of the SoftFloat IEEE Floating-Point Arithmetic
Package, Release 3d, by John R. Hauser.
Package, Release 3e, by John R. Hauser.
Copyright 2011, 2012, 2013, 2014, 2015 The Regents of the University of
California. All rights reserved.

View File

@@ -2,7 +2,7 @@
/*============================================================================
This C source file is part of the SoftFloat IEEE Floating-Point Arithmetic
Package, Release 3d, by John R. Hauser.
Package, Release 3e, by John R. Hauser.
Copyright 2011, 2012, 2013, 2014 The Regents of the University of California.
All rights reserved.

View File

@@ -2,7 +2,7 @@
/*============================================================================
This C source file is part of the SoftFloat IEEE Floating-Point Arithmetic
Package, Release 3d, by John R. Hauser.
Package, Release 3e, by John R. Hauser.
Copyright 2011, 2012, 2013, 2014 The Regents of the University of California.
All rights reserved.

View File

@@ -2,7 +2,7 @@
/*============================================================================
This C source file is part of the SoftFloat IEEE Floating-Point Arithmetic
Package, Release 3d, by John R. Hauser.
Package, Release 3e, by John R. Hauser.
Copyright 2011, 2012, 2013, 2014 The Regents of the University of California.
All rights reserved.

View File

@@ -2,10 +2,10 @@
/*============================================================================
This C source file is part of the SoftFloat IEEE Floating-Point Arithmetic
Package, Release 3d, by John R. Hauser.
Package, Release 3e, by John R. Hauser.
Copyright 2011, 2012, 2013, 2014 The Regents of the University of California.
All rights reserved.
Copyright 2011, 2012, 2013, 2014, 2018 The Regents of the University of
California. All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
@@ -42,9 +42,9 @@ SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#include "softfloat.h"
/*----------------------------------------------------------------------------
| Interpreting the unsigned integer formed from concatenating `uiA64' and
| `uiA0' as an 80-bit extended floating-point value, and likewise interpreting
| the unsigned integer formed from concatenating `uiB64' and `uiB0' as another
| Interpreting the unsigned integer formed from concatenating 'uiA64' and
| 'uiA0' as an 80-bit extended floating-point value, and likewise interpreting
| the unsigned integer formed from concatenating 'uiB64' and 'uiB0' as another
| 80-bit extended floating-point value, and assuming at least on of these
| floating-point values is a NaN, returns the bit pattern of the combined NaN
| result. If either original floating-point value is a signaling NaN, the
@@ -90,8 +90,8 @@ struct uint128
uiMagB64 = uiB64 & 0x7FFF;
if ( uiMagA64 < uiMagB64 ) goto returnB;
if ( uiMagB64 < uiMagA64 ) goto returnA;
if ( uiNonsigA0 < uiNonsigB0 ) goto returnB;
if ( uiNonsigB0 < uiNonsigA0 ) goto returnA;
if ( uiA0 < uiB0 ) goto returnB;
if ( uiB0 < uiA0 ) goto returnA;
if ( uiA64 < uiB64 ) goto returnA;
returnB:
uiZ.v64 = uiB64;

View File

@@ -2,7 +2,7 @@
/*============================================================================
This C source file is part of the SoftFloat IEEE Floating-Point Arithmetic
Package, Release 3d, by John R. Hauser.
Package, Release 3e, by John R. Hauser.
Copyright 2011, 2012, 2013, 2014 The Regents of the University of California.
All rights reserved.

View File

@@ -2,7 +2,7 @@
/*============================================================================
This C source file is part of the SoftFloat IEEE Floating-Point Arithmetic
Package, Release 3d, by John R. Hauser.
Package, Release 3e, by John R. Hauser.
Copyright 2011, 2012, 2013, 2014 The Regents of the University of California.
All rights reserved.

View File

@@ -2,7 +2,7 @@
/*============================================================================
This C source file is part of the SoftFloat IEEE Floating-Point Arithmetic
Package, Release 3d, by John R. Hauser.
Package, Release 3e, by John R. Hauser.
Copyright 2011, 2012, 2013, 2014, 2015 The Regents of the University of
California. All rights reserved.

View File

@@ -2,7 +2,7 @@
/*============================================================================
This C source file is part of the SoftFloat IEEE Floating-Point Arithmetic
Package, Release 3d, by John R. Hauser.
Package, Release 3e, by John R. Hauser.
Copyright 2011, 2012, 2013, 2014 The Regents of the University of California.
All rights reserved.

View File

@@ -2,7 +2,7 @@
/*============================================================================
This C source file is part of the SoftFloat IEEE Floating-Point Arithmetic
Package, Release 3d, by John R. Hauser.
Package, Release 3e, by John R. Hauser.
Copyright 2011, 2012, 2013, 2014 The Regents of the University of California.
All rights reserved.

View File

@@ -2,7 +2,7 @@
/*============================================================================
This C source file is part of the SoftFloat IEEE Floating-Point Arithmetic
Package, Release 3d, by John R. Hauser.
Package, Release 3e, by John R. Hauser.
Copyright 2011, 2012, 2013, 2014 The Regents of the University of California.
All rights reserved.

View File

@@ -2,10 +2,10 @@
/*============================================================================
This C header file is part of the SoftFloat IEEE Floating-Point Arithmetic
Package, Release 3d, by John R. Hauser.
Package, Release 3e, by John R. Hauser.
Copyright 2011, 2012, 2013, 2014, 2015, 2016 The Regents of the University of
California. All rights reserved.
Copyright 2011, 2012, 2013, 2014, 2015, 2016, 2018 The Regents of the
University of California. All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
@@ -39,10 +39,11 @@ SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#include <stdbool.h>
#include <stdint.h>
#include "softfloat_types.h"
#include "primitiveTypes.h"
#include "softfloat.h"
/*----------------------------------------------------------------------------
| Default value for `softfloat_detectTininess'.
| Default value for 'softfloat_detectTininess'.
*----------------------------------------------------------------------------*/
#define init_detectTininess softfloat_tininess_afterRounding
@@ -51,22 +52,22 @@ SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
| invalid exception.
*----------------------------------------------------------------------------*/
#define ui32_fromPosOverflow 0xFFFFFFFF
#define ui32_fromNegOverflow 0
#define ui32_fromNegOverflow 0xFFFFFFFF
#define ui32_fromNaN 0xFFFFFFFF
#define i32_fromPosOverflow 0x7FFFFFFF
#define i32_fromPosOverflow (-0x7FFFFFFF - 1)
#define i32_fromNegOverflow (-0x7FFFFFFF - 1)
#define i32_fromNaN 0x7FFFFFFF
#define i32_fromNaN (-0x7FFFFFFF - 1)
/*----------------------------------------------------------------------------
| The values to return on conversions to 64-bit integer formats that raise an
| invalid exception.
*----------------------------------------------------------------------------*/
#define ui64_fromPosOverflow UINT64_C( 0xFFFFFFFFFFFFFFFF )
#define ui64_fromNegOverflow 0
#define ui64_fromNegOverflow UINT64_C( 0xFFFFFFFFFFFFFFFF )
#define ui64_fromNaN UINT64_C( 0xFFFFFFFFFFFFFFFF )
#define i64_fromPosOverflow UINT64_C( 0x7FFFFFFFFFFFFFFF )
#define i64_fromNegOverflow (-UINT64_C( 0x7FFFFFFFFFFFFFFF ) - 1)
#define i64_fromNaN UINT64_C( 0x7FFFFFFFFFFFFFFF )
#define i64_fromPosOverflow (-INT64_C( 0x7FFFFFFFFFFFFFFF ) - 1)
#define i64_fromNegOverflow (-INT64_C( 0x7FFFFFFFFFFFFFFF ) - 1)
#define i64_fromNaN (-INT64_C( 0x7FFFFFFFFFFFFFFF ) - 1)
/*----------------------------------------------------------------------------
| "Common NaN" structure, used to transfer NaN representations from one format
@@ -87,30 +88,30 @@ struct commonNaN {
#define defaultNaNF16UI 0xFE00
/*----------------------------------------------------------------------------
| Returns true when 16-bit unsigned integer `uiA' has the bit pattern of a
| Returns true when 16-bit unsigned integer 'uiA' has the bit pattern of a
| 16-bit floating-point signaling NaN.
| Note: This macro evaluates its argument more than once.
*----------------------------------------------------------------------------*/
#define softfloat_isSigNaNF16UI( uiA ) ((((uiA) & 0x7E00) == 0x7C00) && ((uiA) & 0x01FF))
/*----------------------------------------------------------------------------
| Assuming `uiA' has the bit pattern of a 16-bit floating-point NaN, converts
| Assuming 'uiA' has the bit pattern of a 16-bit floating-point NaN, converts
| this NaN to the common NaN form, and stores the resulting common NaN at the
| location pointed to by `zPtr'. If the NaN is a signaling NaN, the invalid
| location pointed to by 'zPtr'. If the NaN is a signaling NaN, the invalid
| exception is raised.
*----------------------------------------------------------------------------*/
void softfloat_f16UIToCommonNaN( uint_fast16_t uiA, struct commonNaN *zPtr );
/*----------------------------------------------------------------------------
| Converts the common NaN pointed to by `aPtr' into a 16-bit floating-point
| Converts the common NaN pointed to by 'aPtr' into a 16-bit floating-point
| NaN, and returns the bit pattern of this value as an unsigned integer.
*----------------------------------------------------------------------------*/
uint_fast16_t softfloat_commonNaNToF16UI( const struct commonNaN *aPtr );
/*----------------------------------------------------------------------------
| Interpreting `uiA' and `uiB' as the bit patterns of two 16-bit floating-
| Interpreting 'uiA' and 'uiB' as the bit patterns of two 16-bit floating-
| point values, at least one of which is a NaN, returns the bit pattern of
| the combined NaN result. If either `uiA' or `uiB' has the pattern of a
| the combined NaN result. If either 'uiA' or 'uiB' has the pattern of a
| signaling NaN, the invalid exception is raised.
*----------------------------------------------------------------------------*/
uint_fast16_t
@@ -122,30 +123,30 @@ uint_fast16_t
#define defaultNaNF32UI 0xFFC00000
/*----------------------------------------------------------------------------
| Returns true when 32-bit unsigned integer `uiA' has the bit pattern of a
| Returns true when 32-bit unsigned integer 'uiA' has the bit pattern of a
| 32-bit floating-point signaling NaN.
| Note: This macro evaluates its argument more than once.
*----------------------------------------------------------------------------*/
#define softfloat_isSigNaNF32UI( uiA ) ((((uiA) & 0x7FC00000) == 0x7F800000) && ((uiA) & 0x003FFFFF))
/*----------------------------------------------------------------------------
| Assuming `uiA' has the bit pattern of a 32-bit floating-point NaN, converts
| Assuming 'uiA' has the bit pattern of a 32-bit floating-point NaN, converts
| this NaN to the common NaN form, and stores the resulting common NaN at the
| location pointed to by `zPtr'. If the NaN is a signaling NaN, the invalid
| location pointed to by 'zPtr'. If the NaN is a signaling NaN, the invalid
| exception is raised.
*----------------------------------------------------------------------------*/
void softfloat_f32UIToCommonNaN( uint_fast32_t uiA, struct commonNaN *zPtr );
/*----------------------------------------------------------------------------
| Converts the common NaN pointed to by `aPtr' into a 32-bit floating-point
| Converts the common NaN pointed to by 'aPtr' into a 32-bit floating-point
| NaN, and returns the bit pattern of this value as an unsigned integer.
*----------------------------------------------------------------------------*/
uint_fast32_t softfloat_commonNaNToF32UI( const struct commonNaN *aPtr );
/*----------------------------------------------------------------------------
| Interpreting `uiA' and `uiB' as the bit patterns of two 32-bit floating-
| Interpreting 'uiA' and 'uiB' as the bit patterns of two 32-bit floating-
| point values, at least one of which is a NaN, returns the bit pattern of
| the combined NaN result. If either `uiA' or `uiB' has the pattern of a
| the combined NaN result. If either 'uiA' or 'uiB' has the pattern of a
| signaling NaN, the invalid exception is raised.
*----------------------------------------------------------------------------*/
uint_fast32_t
@@ -157,30 +158,30 @@ uint_fast32_t
#define defaultNaNF64UI UINT64_C( 0xFFF8000000000000 )
/*----------------------------------------------------------------------------
| Returns true when 64-bit unsigned integer `uiA' has the bit pattern of a
| Returns true when 64-bit unsigned integer 'uiA' has the bit pattern of a
| 64-bit floating-point signaling NaN.
| Note: This macro evaluates its argument more than once.
*----------------------------------------------------------------------------*/
#define softfloat_isSigNaNF64UI( uiA ) ((((uiA) & UINT64_C( 0x7FF8000000000000 )) == UINT64_C( 0x7FF0000000000000 )) && ((uiA) & UINT64_C( 0x0007FFFFFFFFFFFF )))
/*----------------------------------------------------------------------------
| Assuming `uiA' has the bit pattern of a 64-bit floating-point NaN, converts
| Assuming 'uiA' has the bit pattern of a 64-bit floating-point NaN, converts
| this NaN to the common NaN form, and stores the resulting common NaN at the
| location pointed to by `zPtr'. If the NaN is a signaling NaN, the invalid
| location pointed to by 'zPtr'. If the NaN is a signaling NaN, the invalid
| exception is raised.
*----------------------------------------------------------------------------*/
void softfloat_f64UIToCommonNaN( uint_fast64_t uiA, struct commonNaN *zPtr );
/*----------------------------------------------------------------------------
| Converts the common NaN pointed to by `aPtr' into a 64-bit floating-point
| Converts the common NaN pointed to by 'aPtr' into a 64-bit floating-point
| NaN, and returns the bit pattern of this value as an unsigned integer.
*----------------------------------------------------------------------------*/
uint_fast64_t softfloat_commonNaNToF64UI( const struct commonNaN *aPtr );
/*----------------------------------------------------------------------------
| Interpreting `uiA' and `uiB' as the bit patterns of two 64-bit floating-
| Interpreting 'uiA' and 'uiB' as the bit patterns of two 64-bit floating-
| point values, at least one of which is a NaN, returns the bit pattern of
| the combined NaN result. If either `uiA' or `uiB' has the pattern of a
| the combined NaN result. If either 'uiA' or 'uiB' has the pattern of a
| signaling NaN, the invalid exception is raised.
*----------------------------------------------------------------------------*/
uint_fast64_t
@@ -194,7 +195,7 @@ uint_fast64_t
/*----------------------------------------------------------------------------
| Returns true when the 80-bit unsigned integer formed from concatenating
| 16-bit `uiA64' and 64-bit `uiA0' has the bit pattern of an 80-bit extended
| 16-bit 'uiA64' and 64-bit 'uiA0' has the bit pattern of an 80-bit extended
| floating-point signaling NaN.
| Note: This macro evaluates its arguments more than once.
*----------------------------------------------------------------------------*/
@@ -203,15 +204,15 @@ uint_fast64_t
#ifdef SOFTFLOAT_FAST_INT64
/*----------------------------------------------------------------------------
| The following functions are needed only when `SOFTFLOAT_FAST_INT64' is
| The following functions are needed only when 'SOFTFLOAT_FAST_INT64' is
| defined.
*----------------------------------------------------------------------------*/
/*----------------------------------------------------------------------------
| Assuming the unsigned integer formed from concatenating `uiA64' and `uiA0'
| Assuming the unsigned integer formed from concatenating 'uiA64' and 'uiA0'
| has the bit pattern of an 80-bit extended floating-point NaN, converts
| this NaN to the common NaN form, and stores the resulting common NaN at the
| location pointed to by `zPtr'. If the NaN is a signaling NaN, the invalid
| location pointed to by 'zPtr'. If the NaN is a signaling NaN, the invalid
| exception is raised.
*----------------------------------------------------------------------------*/
void
@@ -219,16 +220,16 @@ void
uint_fast16_t uiA64, uint_fast64_t uiA0, struct commonNaN *zPtr );
/*----------------------------------------------------------------------------
| Converts the common NaN pointed to by `aPtr' into an 80-bit extended
| Converts the common NaN pointed to by 'aPtr' into an 80-bit extended
| floating-point NaN, and returns the bit pattern of this value as an unsigned
| integer.
*----------------------------------------------------------------------------*/
struct uint128 softfloat_commonNaNToExtF80UI( const struct commonNaN *aPtr );
/*----------------------------------------------------------------------------
| Interpreting the unsigned integer formed from concatenating `uiA64' and
| `uiA0' as an 80-bit extended floating-point value, and likewise interpreting
| the unsigned integer formed from concatenating `uiB64' and `uiB0' as another
| Interpreting the unsigned integer formed from concatenating 'uiA64' and
| 'uiA0' as an 80-bit extended floating-point value, and likewise interpreting
| the unsigned integer formed from concatenating 'uiB64' and 'uiB0' as another
| 80-bit extended floating-point value, and assuming at least on of these
| floating-point values is a NaN, returns the bit pattern of the combined NaN
| result. If either original floating-point value is a signaling NaN, the
@@ -250,17 +251,17 @@ struct uint128
/*----------------------------------------------------------------------------
| Returns true when the 128-bit unsigned integer formed from concatenating
| 64-bit `uiA64' and 64-bit `uiA0' has the bit pattern of a 128-bit floating-
| 64-bit 'uiA64' and 64-bit 'uiA0' has the bit pattern of a 128-bit floating-
| point signaling NaN.
| Note: This macro evaluates its arguments more than once.
*----------------------------------------------------------------------------*/
#define softfloat_isSigNaNF128UI( uiA64, uiA0 ) ((((uiA64) & UINT64_C( 0x7FFF800000000000 )) == UINT64_C( 0x7FFF000000000000 )) && ((uiA0) || ((uiA64) & UINT64_C( 0x00007FFFFFFFFFFF ))))
/*----------------------------------------------------------------------------
| Assuming the unsigned integer formed from concatenating `uiA64' and `uiA0'
| Assuming the unsigned integer formed from concatenating 'uiA64' and 'uiA0'
| has the bit pattern of a 128-bit floating-point NaN, converts this NaN to
| the common NaN form, and stores the resulting common NaN at the location
| pointed to by `zPtr'. If the NaN is a signaling NaN, the invalid exception
| pointed to by 'zPtr'. If the NaN is a signaling NaN, the invalid exception
| is raised.
*----------------------------------------------------------------------------*/
void
@@ -268,15 +269,15 @@ void
uint_fast64_t uiA64, uint_fast64_t uiA0, struct commonNaN *zPtr );
/*----------------------------------------------------------------------------
| Converts the common NaN pointed to by `aPtr' into a 128-bit floating-point
| Converts the common NaN pointed to by 'aPtr' into a 128-bit floating-point
| NaN, and returns the bit pattern of this value as an unsigned integer.
*----------------------------------------------------------------------------*/
struct uint128 softfloat_commonNaNToF128UI( const struct commonNaN * );
/*----------------------------------------------------------------------------
| Interpreting the unsigned integer formed from concatenating `uiA64' and
| `uiA0' as a 128-bit floating-point value, and likewise interpreting the
| unsigned integer formed from concatenating `uiB64' and `uiB0' as another
| Interpreting the unsigned integer formed from concatenating 'uiA64' and
| 'uiA0' as a 128-bit floating-point value, and likewise interpreting the
| unsigned integer formed from concatenating 'uiB64' and 'uiB0' as another
| 128-bit floating-point value, and assuming at least on of these floating-
| point values is a NaN, returns the bit pattern of the combined NaN result.
| If either original floating-point value is a signaling NaN, the invalid
@@ -293,14 +294,14 @@ struct uint128
#else
/*----------------------------------------------------------------------------
| The following functions are needed only when `SOFTFLOAT_FAST_INT64' is not
| The following functions are needed only when 'SOFTFLOAT_FAST_INT64' is not
| defined.
*----------------------------------------------------------------------------*/
/*----------------------------------------------------------------------------
| Assuming the 80-bit extended floating-point value pointed to by `aSPtr' is
| Assuming the 80-bit extended floating-point value pointed to by 'aSPtr' is
| a NaN, converts this NaN to the common NaN form, and stores the resulting
| common NaN at the location pointed to by `zPtr'. If the NaN is a signaling
| common NaN at the location pointed to by 'zPtr'. If the NaN is a signaling
| NaN, the invalid exception is raised.
*----------------------------------------------------------------------------*/
void
@@ -308,9 +309,9 @@ void
const struct extFloat80M *aSPtr, struct commonNaN *zPtr );
/*----------------------------------------------------------------------------
| Converts the common NaN pointed to by `aPtr' into an 80-bit extended
| Converts the common NaN pointed to by 'aPtr' into an 80-bit extended
| floating-point NaN, and stores this NaN at the location pointed to by
| `zSPtr'.
| 'zSPtr'.
*----------------------------------------------------------------------------*/
void
softfloat_commonNaNToExtF80M(
@@ -318,8 +319,8 @@ void
/*----------------------------------------------------------------------------
| Assuming at least one of the two 80-bit extended floating-point values
| pointed to by `aSPtr' and `bSPtr' is a NaN, stores the combined NaN result
| at the location pointed to by `zSPtr'. If either original floating-point
| pointed to by 'aSPtr' and 'bSPtr' is a NaN, stores the combined NaN result
| at the location pointed to by 'zSPtr'. If either original floating-point
| value is a signaling NaN, the invalid exception is raised.
*----------------------------------------------------------------------------*/
void
@@ -338,10 +339,10 @@ void
#define defaultNaNF128UI0 0
/*----------------------------------------------------------------------------
| Assuming the 128-bit floating-point value pointed to by `aWPtr' is a NaN,
| Assuming the 128-bit floating-point value pointed to by 'aWPtr' is a NaN,
| converts this NaN to the common NaN form, and stores the resulting common
| NaN at the location pointed to by `zPtr'. If the NaN is a signaling NaN,
| the invalid exception is raised. Argument `aWPtr' points to an array of
| NaN at the location pointed to by 'zPtr'. If the NaN is a signaling NaN,
| the invalid exception is raised. Argument 'aWPtr' points to an array of
| four 32-bit elements that concatenate in the platform's normal endian order
| to form a 128-bit floating-point value.
*----------------------------------------------------------------------------*/
@@ -349,9 +350,9 @@ void
softfloat_f128MToCommonNaN( const uint32_t *aWPtr, struct commonNaN *zPtr );
/*----------------------------------------------------------------------------
| Converts the common NaN pointed to by `aPtr' into a 128-bit floating-point
| NaN, and stores this NaN at the location pointed to by `zWPtr'. Argument
| `zWPtr' points to an array of four 32-bit elements that concatenate in the
| Converts the common NaN pointed to by 'aPtr' into a 128-bit floating-point
| NaN, and stores this NaN at the location pointed to by 'zWPtr'. Argument
| 'zWPtr' points to an array of four 32-bit elements that concatenate in the
| platform's normal endian order to form a 128-bit floating-point value.
*----------------------------------------------------------------------------*/
void
@@ -359,10 +360,10 @@ void
/*----------------------------------------------------------------------------
| Assuming at least one of the two 128-bit floating-point values pointed to by
| `aWPtr' and `bWPtr' is a NaN, stores the combined NaN result at the location
| pointed to by `zWPtr'. If either original floating-point value is a
| signaling NaN, the invalid exception is raised. Each of `aWPtr', `bWPtr',
| and `zWPtr' points to an array of four 32-bit elements that concatenate in
| 'aWPtr' and 'bWPtr' is a NaN, stores the combined NaN result at the location
| pointed to by 'zWPtr'. If either original floating-point value is a
| signaling NaN, the invalid exception is raised. Each of 'aWPtr', 'bWPtr',
| and 'zWPtr' points to an array of four 32-bit elements that concatenate in
| the platform's normal endian order to form a 128-bit floating-point value.
*----------------------------------------------------------------------------*/
void

View File

@@ -2,7 +2,7 @@
/*============================================================================
This C source file is part of the SoftFloat IEEE Floating-Point Arithmetic
Package, Release 3d, by John R. Hauser.
Package, Release 3e, by John R. Hauser.
Copyright 2011, 2012, 2013, 2014 The Regents of the University of California.
All rights reserved.

View File

@@ -2,7 +2,7 @@
/*============================================================================
This C source file is part of the SoftFloat IEEE Floating-Point Arithmetic
Package, Release 3d, by John R. Hauser.
Package, Release 3e, by John R. Hauser.
Copyright 2011, 2012, 2013, 2014 The Regents of the University of California.
All rights reserved.

View File

@@ -2,7 +2,7 @@
/*============================================================================
This C source file is part of the SoftFloat IEEE Floating-Point Arithmetic
Package, Release 3d, by John R. Hauser.
Package, Release 3e, by John R. Hauser.
Copyright 2011, 2012, 2013, 2014 The Regents of the University of California.
All rights reserved.

View File

@@ -2,7 +2,7 @@
/*============================================================================
This C source file is part of the SoftFloat IEEE Floating-Point Arithmetic
Package, Release 3d, by John R. Hauser.
Package, Release 3e, by John R. Hauser.
Copyright 2011, 2012, 2013, 2014 The Regents of the University of California.
All rights reserved.

View File

@@ -2,7 +2,7 @@
/*============================================================================
This C source file is part of the SoftFloat IEEE Floating-Point Arithmetic
Package, Release 3d, by John R. Hauser.
Package, Release 3e, by John R. Hauser.
Copyright 2011, 2012, 2013, 2014 The Regents of the University of California.
All rights reserved.

View File

@@ -2,7 +2,7 @@
/*============================================================================
This C source file is part of the SoftFloat IEEE Floating-Point Arithmetic
Package, Release 3d, by John R. Hauser.
Package, Release 3e, by John R. Hauser.
Copyright 2011, 2012, 2013, 2014 The Regents of the University of California.
All rights reserved.

View File

@@ -2,7 +2,7 @@
/*============================================================================
This C source file is part of the SoftFloat IEEE Floating-Point Arithmetic
Package, Release 3d, by John R. Hauser.
Package, Release 3e, by John R. Hauser.
Copyright 2011, 2012, 2013, 2014, 2015 The Regents of the University of
California. All rights reserved.

View File

@@ -2,7 +2,7 @@
/*============================================================================
This C source file is part of the SoftFloat IEEE Floating-Point Arithmetic
Package, Release 3d, by John R. Hauser.
Package, Release 3e, by John R. Hauser.
Copyright 2011, 2012, 2013, 2014 The Regents of the University of California.
All rights reserved.

View File

@@ -2,7 +2,7 @@
/*============================================================================
This C source file is part of the SoftFloat IEEE Floating-Point Arithmetic
Package, Release 3d, by John R. Hauser.
Package, Release 3e, by John R. Hauser.
Copyright 2011, 2012, 2013, 2014 The Regents of the University of California.
All rights reserved.

Some files were not shown because too many files have changed in this diff Show More