Commit Graph

206057 Commits

Author SHA1 Message Date
Richard Earnshaw 439779bace arm: libgcc: provide implementations of __sync_synchronize
Prior to Armv6 there was no architected method to synchronize data
across processors.  Armv6 saw the first introduction of
multi-processor support, using a CP15 operation; but this was
deprecated in Armv7 and is not supported on m-profile devices of any
form.  Armv7 (and armv6-m) and later support data synchronization via
the DMB instruction.

This all leads to difficulties when linking programs as the user
generally needs to know which synchronization method is needed, but
there seems no easy way around this, when there are no OS-related
primitives available.

I've addressed this by adding multiple variants of __sync_synchronize
to libgcc, one for each of the above use cases.  I've named these
__sync_synchronize_none, __sync_synchronize_cp15dmb and
__sync_synchronize_dmb.  I've also added three specs files that can be
used to direct the linker to pick the appropriate implementation.
Using specs fragments for this is preferable to directing the user to
directly use --defsym as the latter has to be placed at the correct
position on the command line to be effective and the spec rule ensures
this automatically.

I've also added a default implementation of __sync_synchronize.  The
default implementation will use DMB if that is available in the target
ISA, or fall back to a nul-implementation if it isn't.  In the latter
case it will cause the linker (GNU LD) to emit a warning that
specifies how to pick a specific implementation.  I've chosen not to
permit this default to use the CP15 solution as that has been
deprecated.

libgcc:

	* config.host (arm*-*-eabi* | arm*-*-rtems*):
	Add arm/t-sync to the makefile rules.
	* config/arm/lib1funcs.S (__sync_synchronize_none)
	(__sync_synchronize_cp15dmb, __sync_synchronize_dmb)
	(__sync_synchronize): New functions.
	* config/arm/t-sync: New file.
	* config/arm/sync-none.specs: Likewise.
	* config/arm/sync-dmb.specs: Likewise.
	* config/arm/sync-cp15dmb.specs: Likewise.
2023-11-24 14:15:26 +00:00
Tobias Burnus 1802f64e67 OpenMP: Accept argument to depobj's destroy clause
Since OpenMP 5.2, the destroy clause takes an depend argument as argument;
for the depobj directive, it the new argument is optional but, if present,
it must be identical to the directive's argument.

gcc/c/ChangeLog:

	* c-parser.cc (c_parser_omp_depobj): Accept optionally an argument
	to the destroy clause.

gcc/cp/ChangeLog:

	* parser.cc (cp_parser_omp_depobj): Accept optionally an argument
	to the destroy clause.

gcc/fortran/ChangeLog:

	* openmp.cc (gfc_match_omp_depobj): Accept optionally an argument
	to the destroy clause.

libgomp/ChangeLog:

	* libgomp.texi (5.2 Impl. Status): An argument to the destroy clause
	is now supported.

gcc/testsuite/ChangeLog:

	* c-c++-common/gomp/depobj-3.c: New test.
	* gfortran.dg/gomp/depobj-3.f90: New test.
2023-11-24 15:10:49 +01:00
Nathaniel Shead 726723c476 c++: Allow exporting const-qualified namespace-scope variables [PR99232]
By [basic.link] p3.2.1, a non-template non-volatile const-qualified
variable is not necessarily internal linkage in a module declaration,
and rather may have module linkage (or external linkage if it is
exported, see p4.8).

	PR c++/99232

gcc/cp/ChangeLog:

	* decl.cc (grokvardecl): Don't mark variables attached to
	modules as internal.

gcc/testsuite/ChangeLog:

	* g++.dg/modules/pr99232_a.C: New test.
	* g++.dg/modules/pr99232_b.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
2023-11-25 00:55:15 +11:00
Juzhe-Zhong aea337cf74 RISC-V: Fix inconsistency among all vectorization hooks
This patches 200+ ICEs exposed by testing with rv64gc_zve64d.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112694

The rootcause is we disallow poly (1,1) size vectorization in preferred_simd_mode.
with this following code:
-      if (TARGET_MIN_VLEN < 128 && TARGET_MAX_LMUL < RVV_M2)
-       return word_mode;

However, we allow poly (1,1) size in hook:
TARGET_VECTORIZE_RELATED_MODE
TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES

And also enables it in all vectorization patterns.

I was adding this into preferred_simd_mode because poly (1,1) size mode will cause
ICE in can_duplicate_and_interleave_p.

So, the alternative approach we need to block poly (1,1) size in both TARGET_VECTORIZE_RELATED_MODE
and TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES hooks and all vectorization patterns.
which is ugly approach and too much codes change.

Now, after investivation, I find it's nice that loop vectorizer can automatically block poly (1,1)
size vector in interleave vectorization with this commit:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=730909fa858bd691095bc23655077aa13b7941a9

So, we don't need to worry about ICE in interleave vectorization and allow poly (1,1) size vector
in vectorization which fixes 200+ ICEs in zve64d march.

	PR target/112694

gcc/ChangeLog:

	* config/riscv/riscv-v.cc (preferred_simd_mode): Allow poly_int (1,1) vectors.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/autovec/pr112694-1.c: New test.
2023-11-24 21:28:02 +08:00
Alexander Monakov f9a10e9149 gcc: configure: drop Valgrind 3.1 compatibility
Our system.h and configure.ac try to accommodate valgrind-3.1, but it is
more than 15 years old at this point. As Valgrind-based checking is a
developer-oriented feature, drop the compatibility stuff and streamline
the detection.

gcc/ChangeLog:

	* config.in: Regenerate.
	* configure: Regenerate.
	* configure.ac: Delete manual checks for old Valgrind headers.
	* system.h (VALGRIND_MAKE_MEM_NOACCESS): Delete.
	(VALGRIND_MAKE_MEM_DEFINED): Delete.
	(VALGRIND_MAKE_MEM_UNDEFINED): Delete.
	(VALGRIND_MALLOCLIKE_BLOCK): Delete.
	(VALGRIND_FREELIKE_BLOCK): Delete.
2023-11-24 16:17:45 +03:00
Alexander Monakov ab78426ae7 libcpp: configure: drop unused Valgrind detection
When top-level configure has either --enable-checking=valgrind or
--enable-valgrind-annotations, we want to activate a couple of workarounds
in libcpp. They do not use anything from the Valgrind API, so just
delete all detection.

libcpp/ChangeLog:

	* config.in: Regenerate.
	* configure: Regenerate.
	* configure.ac (ENABLE_VALGRIND_CHECKING): Delete.
	(ENABLE_VALGRIND_ANNOTATIONS): Rename to
	ENABLE_VALGRIND_WORKAROUNDS.  Delete Valgrind header checks.
	* lex.cc (new_buff): Adjust for renaming.
	(_cpp_free_buff): Ditto.
2023-11-24 16:13:56 +03:00
Jakub Jelinek 3eb9cae6d3 i386: Fix ICE during cbranchv16qi4 expansion [PR112681]
The following testcase ICEs, because cbranchv16qi4 expansion calls
ix86_expand_branch with op1 being a pre-AVX unaligned memory and
ix86_expand_branch emits a xorv16qi3 instruction without making sure
the operand predicates are satisfied.
While I could manually check if the argument (or both?) doesn't
match vector_operand predicate (apparently this one or bcst_vector_operand
is used in all integral 16+ bytes *xorv*3 instructions) force it into a
register, but as all gen_xorv*3 expanders call
ix86_expand_vector_logical_operator, it seems easier to just call that
function which ensures the right thing happens.  Calling the individual
gen_xorv*3 functions would mean ugly switch on the modes and using high
level expand_simple_binop here seems too high level to me.

2023-11-24  Jakub Jelinek  <jakub@redhat.com>

	PR target/112681
	* config/i386/i386-expand.cc (ix86_expand_branch): Use
	ix86_expand_vector_logical_operator to expand vector XOR rather than
	gen_rtx_SET on gen_rtx_XOR.

	* gcc.target/i386/sse4-pr112681.c: New test.
2023-11-24 12:13:07 +01:00
Alex Coplan fea27dfd22 rtl-ssa: Add some helpers for removing accesses
This adds some helpers to access-utils.h for removing accesses from an
access_array.  This is needed by the upcoming aarch64 load/store pair
fusion pass.

gcc/ChangeLog:

	* rtl-ssa/access-utils.h (filter_accesses): New.
	(remove_regno_access): New.
	(check_remove_regno_access): New.
	* rtl-ssa/accesses.cc (rtl_ssa::remove_note_accesses_base): Use
	new filter_accesses helper.
2023-11-24 10:58:06 +00:00
Alex Coplan a49befbd2c rtl-ssa: Support for inserting new insns
The upcoming aarch64 load pair pass needs to form store pairs, and can
re-order stores over loads when alias analysis determines this is safe.
In the case that both mem defs have uses in the RTL-SSA IR, and both
stores require re-ordering over their uses, we represent that as
(tentative) deletion of the original store insns and creation of a new
insn, to prevent requiring repeated re-parenting of uses during the
pass.  We then update all mem uses that require re-parenting in one go
at the end of the pass.

To support this, RTL-SSA needs to handle inserting new insns (rather
than just changing existing ones), so this patch adds support for that.

New insns (and new accesses) are temporaries, allocated above a temporary
obstack_watermark, such that the user can easily back out of a change without
awkward bookkeeping.

gcc/ChangeLog:

	* rtl-ssa/accesses.cc (function_info::create_set): New.
	* rtl-ssa/accesses.h (access_info::is_temporary): New.
	* rtl-ssa/changes.cc (move_insn): Handle new (temporary) insns.
	(function_info::finalize_new_accesses): Handle new/temporary
	user-created accesses.
	(function_info::apply_changes_to_insn): Ensure m_is_temp flag
	on new insns gets cleared.
	(function_info::change_insns): Handle new/temporary insns.
	(function_info::create_insn): New.
	* rtl-ssa/changes.h (class insn_change): Make function_info a
	friend class.
	* rtl-ssa/functions.h (function_info): Declare new entry points:
	create_set, create_insn.  Declare new change_alloc helper.
	* rtl-ssa/insns.cc (insn_info::print_full): Identify temporary insns in
	dump.
	* rtl-ssa/insns.h (insn_info): Add new m_is_temp flag and accompanying
	is_temporary accessor.
	* rtl-ssa/internals.inl (insn_info::insn_info): Initialize m_is_temp to
	false.
	* rtl-ssa/member-fns.inl (function_info::change_alloc): New.
	* rtl-ssa/movement.h (restrict_movement_for_defs_ignoring): Add
	handling for temporary defs.
2023-11-24 10:57:51 +00:00
Jakub Jelinek eebcad0ac2 match.pd: Avoid simplification into invalid BIT_FIELD_REFs [PR112673]
The following testcase is lowered by the bitint lowering pass, then
vectorizer vectorizes one of the loops in it, so we have
  vect__18.6_34 = VIEW_CONVERT_EXPR<vector(4) unsigned long>(x_35(D));
  _8 = BIT_FIELD_REF <vect__18.6_34, 64, 0>;
...
  _18 = BIT_FIELD_REF <vect__18.6_34, 64, 64>;
etc. where x_35(D) is _BitInt(256) argument.  That is valid BIT_FIELD_REF,
the first argument is a vector and it extracts the vector elements from it.
Then comes forwprop4 and simplifies that using match.pd into
  _8 = (unsigned long) x_35(D);
...
  _18 = BIT_FIELD_REF <x_35(D), 64, 64>;
and tree-cfg verification ICEs on the latter (though, even the first cast
is kind of undesirable after bitint lowering, we want large/huge bitints
lowered).  The ICE is because if BIT_FIELD_REFs first argument has
INTEGRAL_TYPE_P, we require type_has_mode_precision_p, but that is not the
case of _BitInt(256), it has BLKmode.

The following patch fixes it by doing the BIT_FIELD_REF with VCE to
BIT_FIELD_REF simplification only if the result is valid.

2023-11-24  Jakub Jelinek  <jakub@redhat.com>

	PR tree-optimization/112673
	* match.pd (bit_field_ref (vce @0) -> bit_field_ref @0): Only simplify
	if either @0 doesn't have scalar integral type or if it has mode
	precision.

	* gcc.dg/pr112673.c: New test.
2023-11-24 11:32:28 +01:00
Jakub Jelinek 31669ec1d0 lower-bitint: Lower FLOAT_EXPR from BITINT_TYPE INTEGER_CST [PR112679]
The bitint lowering pass only does something if it sees BITINT_TYPE (medium,
large, huge) SSA_NAMEs.  In the past I've already ran into one special case
where the above doesn't work well, if there is a store of medium/large/huge
BITINT_TYPE INTEGER_CST into memory, there might not be any BITINT_TYPE
SSA_NAMEs in the function, yet we need to lower.  This has been solved by
also checking for SSA_NAME_IS_VIRTUAL_OPERAND if at the vdef there isn't
such a store (the whole intent is make the pass as cheap as possible in the
currently very likely case that the IL doesn't have any BITINT_TYPEs at
all).
And the following testcase shows a similar problem.  With -frounding-math
we don't fold some of FLOAT_EXPRs with INTEGER_CST operands, and if those
INTEGER_CSTs are medium/large/huge BITINT_TYPEs, we need to either cast
the INTEGER_CST to corresponding INTEGER_TYPE (for medium) or lower to
internal fn call which is later turned into libgcc call (for large/huge).
The following patch does that, but of course admittedly this discovery
of stores and FLOAT_EXPRs means we already look through quite a few
SSA_NAME_DEF_STMTs even when BITINT_TYPEs never appear.

2023-11-23  Jakub Jelinek  <jakub@redhat.com>

	PR middle-end/112679
	* gimple-lower-bitint.cc (gimple_lower_bitint): Also stop first loop on
	floating point SSA_NAME set in FLOAT_EXPR assignment from BITINT_TYPE
	INTEGER_CST.  Set has_large_huge for those if that BITINT_TYPE is large
	or huge.  Set kind to such FLOAT_EXPR assignment rhs1 BITINT_TYPE's kind.

	* gcc.dg/bitint-42.c: New test.
2023-11-24 11:30:30 +01:00
Richard Biener 9f63a88981 tree-optimization/112677 - stack corruption with .COND_* reduction
The following makes sure to allocate enough space for vectype_op
in vectorizable_reduction.

	PR tree-optimization/112677
	* tree-vect-loop.cc (vectorizable_reduction): Use alloca
	to allocate vectype_op.
2023-11-24 11:25:54 +01:00
Haochen Gui e377a340b3 Clean up by_pieces_ninsns
The by pieces compare can be implemented by overlapped operations. So
it should be taken into consideration when doing the adjustment for
overlap operations.  The mode returned from
widest_fixed_size_mode_for_size is already checked with mov_optab in
by_pieces_mode_supported_p called by widest_fixed_size_mode_for_size.
So it is no need to check mov_optab again in by_pieces_ninsns.  The
patch fixes these issues.

gcc/
	* expr.cc (by_pieces_ninsns): Include by pieces compare when
	do the adjustment for overlap operations.  Replace mov_optab
	checks with gcc assertion.
2023-11-24 17:16:18 +08:00
Jakub Jelinek 9a96a9e45b lower-bitint: Fix up -fnon-call-exceptions bit-field load lowering [PR112668]
As the following testcase shows, there are some bugs in the
-fnon-call-exceptions bit-field load lowering.  In particular, there
is a case where we want to emit a load early in the initialization
(before m_init_gsi) and because that load might throw exception, need
to split block after the load so that it has an EH edge.
Now, across this splitting, we have m_init_gsi, save_gsi (something
we put back into m_gsi afterwards) statement iterators and m_preheader_bb
which is used to determine the pre-header edge of a loop (if any).
As the testcase shows, both of these statement iterators and m_preheader_bb
as well need adjustments if the block was split.  If the stmt iterators
refer to a statement, they need to be updated so that if the statement is
in the bb after the split gsi_bb and gsi_seq is updated, otherwise they
ought to be the start of the new (second) bb.
Similarly, m_preheader_bb should be updated to the second bb if it was
the first before.  Other spots where we insert something before m_init_gsi
don't split blocks in there and are fine.

The m_gsi iterator is normal iterator to insert statements before it,
so gsi_end_p means insert statements at the end of basic block.
m_init_gsi is on the other side an iterator after which statements should be
inserted (so gsi_end_p means insert statements at the start of basic block
after labels), but the whole pass is written for insertion of statements before
iterators, so when in 3 spots it wants to insert something after m_init_gsi,
it saves current iterator to save_gsi and sets m_gsi to gsi_after_labels
if m_init_gsi was gsi_end_p, or to the next statement.  But it actually wasn't
updating m_init_gsi back when switching to normal iterator, this patch changes
that such that further statements after m_init_gsi will appear after the
set of statements inserted before m_init_gsi.

Finally, the pass had a couple of places where it wanted to create a gsi_end_p
iterator for a particular basic block, instead of doing
m_gsi = gsi_last_bb (bb); if (!gsi_end_p (m_gsi)) gsi_next (&m_gsi);
the pass now uses new m_gsi = gsi_end_bb (bb) function.

2023-11-24  Jakub Jelinek  <jakub@redhat.com>

	PR middle-end/112668
	* gimple-iterator.h (gsi_end, gsi_end_bb): New inline functions.
	* gimple-lower-bitint.cc (bitint_large_huge::handle_cast): After
	temporarily adding statements after m_init_gsi, update m_init_gsi
	such that later additions after it will be after the added statements.
	(bitint_large_huge::handle_load): Likewise.  When splitting
	gsi_bb (m_init_gsi) basic block, update m_preheader_bb if needed
	and update saved m_gsi as well if needed.
	(bitint_large_huge::lower_mergeable_stmt,
	bitint_large_huge::lower_comparison_stmt,
	bitint_large_huge::lower_mul_overflow,
	bitint_large_huge::lower_bit_query): Use gsi_end_bb.

	* gcc.dg/bitint-40.c: New test.
2023-11-24 08:54:40 +01:00
Jakub Jelinek 1c44bd92a8 tree: Fix up try_catch_may_fallthru [PR112619]
The following testcase ICEs with -std=c++98 since r14-5086 because
block_may_fallthru is called on a TRY_CATCH_EXPR whose second operand
is a MODIFY_EXPR rather than STATEMENT_LIST, which try_catch_may_fallthru
apparently expects.
I've been wondering whether that isn't some kind of FE bug and whether
there isn't some unwritten rule that second operand of TRY_CATCH_EXPR
must be a STATEMENT_LIST.  Looking at the FEs, the C++ FE uses mostly its
own trees, TRY_BLOCK (TRY_CATCH_EXPR replacement) with HANDLER in it (CATCH_EXPR
replacement) - but HANDLER can be immediate second operand rather than nested
in STATEMENT_LIST, EH_SPEC_BLOCK (this one stands for both TRY_CATCH_EXPR
and EH_FILTER_EXPR in its second argument); both of these are only replaced
by the generic trees during gimplification though, so will unlikely be seen
by block_may_fallthru; and then CLEANUP_STMT, which is genericized
into TRY_CATCH_EXPR with non-CATCH_EXPR/EH_FILTER_EXPR in its body (this is
the one that causes the ICE on this testcase).
The Go and Rust FEs create TRY_CATCH_EXPR with CATCH_EXPR immediately in its
second argument (but either are unlucky that block_may_fallthru isn't called
or the body can always fallthru, or latent ICE), while the D FE most likely
hit this ICE and attempts to work around it, by checking at TRY_CATCH_EXPR
creation time if the second argument from pop_stmt_list is STATEMENT_LIST and
if not, forcefully wraps it into a STATEMENT_LIST.

Unfortunately, I don't see an easy way to create an artificial tree iterator
from just a single tree statement, so the patch duplicates what the loops
later do (after all, it is very simple, just didn't want to duplicate
also the large comments explaning it, so the 3 See below. comments).

2023-11-24  Jakub Jelinek  <jakub@redhat.com>

	PR c++/112619
	* tree.cc (try_catch_may_fallthru): If second operand of
	TRY_CATCH_EXPR is not a STATEMENT_LIST, handle it as if it was a
	STATEMENT_LIST containing a single statement.

	* g++.dg/eh/pr112619.C: New test.
2023-11-24 08:54:06 +01:00
Richard Biener a7d82b45ed tree-optimization/112344 - relax final value-replacement fix
The following tries to reduce the number of cases we use an unsigned
type for the addition when we know the original signed increment was
OK which is when the total unsigned increment computed fits the signed
type as well.

This fixes the observed testsuite fallout.

	PR tree-optimization/112344
	* tree-chrec.cc (chrec_apply): Only use an unsigned add
	when the overall increment doesn't fit the signed type.
2023-11-24 08:49:59 +01:00
Juzhe-Zhong d83013b88b RISC-V: Optimize a special case of VLA SLP
When working on fixing bugs of zvl1024b. I notice a special VLA SLP case
can be better optimized.

v = vec_perm (op1, op2, { nunits - 1, nunits, nunits + 1, ... })

Before this patch, we are using genriec approach (vrgather):

vid
vadd.vx
vrgather
vmsgeu
vrgather

With this patch, we use vec_extract + slide1up:

scalar = vec_extract (last element of op1)
v = slide1up (op2, scalar)

Tested on zvl128b/zvl256b/zvl512b/zvl1024b of both RV32 and RV64 no regression.

Ok for trunk ?

	PR target/112599

gcc/ChangeLog:

	* config/riscv/riscv-v.cc (shuffle_extract_and_slide1up_patterns): New function.
	(expand_vec_perm_const_1): Add new optimization.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/autovec/pr112599-2.c: New test.
2023-11-24 14:34:04 +08:00
Juzhe-Zhong af7a422da4 RISC-V: Disable BSWAP optimization for NUNITS < 4
When fixing bugs, I notice there is a piece odd codes look incorrect.
which probably make codegen worse.

#include <stdint.h>

typedef int8_t vnx2qi __attribute__ ((vector_size (2)));

#define MASK_2(X, Y) (Y) - 1 - (X), (Y) - 2 - (X)

#define PERMUTE(TYPE, NUNITS)                                                  \
  __attribute__ ((noipa)) void permute_##TYPE (TYPE values1, TYPE values2,     \
					       TYPE *out)                      \
  {                                                                            \
    TYPE v                                                                     \
      = __builtin_shufflevector (values1, values2, MASK_##NUNITS (0, NUNITS)); \
    *(TYPE *) out = v;                                                         \
  }

#define TEST_ALL(T)                                                            \
  T (vnx2qi, 2)

TEST_ALL (PERMUTE)

Before this patch:

        vsetivli        zero,2,e8,mf8,ta,ma
        vle8.v  v1,0(a0)
        vsetivli        zero,1,e16,mf4,ta,ma
        vsrl.vi v2,v1,8
        vsll.vi v1,v1,8
        vor.vv  v1,v2,v1
        vsetivli        zero,2,e8,mf8,ta,ma
        vse8.v  v1,0(a2)
        ret

After this patch:

        vsetivli        zero,2,e8,mf8,ta,ma
        vle8.v  v3,0(a0)
        vid.v   v1
        vrsub.vi        v1,v1,1
        vrgather.vv     v2,v3,v1
        vse8.v  v2,0(a2)
        ret

Committed as it is very obvious if during code review.

gcc/ChangeLog:

	* config/riscv/riscv-v.cc (shuffle_bswap_pattern): Disable for NUNIT < 4.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c: Adapt test.
	* gcc.target/riscv/rvv/autovec/vls/perm-4.c: Ditto.
2023-11-24 13:08:21 +08:00
Nathaniel Shead cff1fa6625 c++: Support lambdas in static template member initialisers [PR107398]
The testcase noted in the PR fails because the context of the lambda is
not in namespace scope, but rather in class scope. This patch removes
the assertion that the context must be a namespace and ensures that
lambdas in class scope still get the correct merge_kind.

	PR c++/107398

gcc/cp/ChangeLog:

	* module.cc (trees_out::get_merge_kind): Handle lambdas in class
	scope.
	(maybe_key_decl): Remove assertion and fix whitespace.

gcc/testsuite/ChangeLog:

	* g++.dg/modules/lambda-6_a.C: New test.
	* g++.dg/modules/lambda-6_b.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
2023-11-24 13:31:11 +11:00
Haochen Jiang a1f8e65dee i386: Fix AVX512 and AVX10 option issues
gcc/ChangeLog:

	PR target/112643
	* config/i386/driver-i386.cc (check_avx10_avx512_features):
	Renamed to ...
	(check_avx512_features): this and remove avx10 check.
	(host_detect_local_cpu): Never append -mno-avx10.1-{256,512} to
	avoid emitting warnings when building GCC with native arch.
	* config/i386/i386-builtin.def (BDESC): Add missing AVX512VL for
	128/256 bit builtin for AVX512VP2INTERSECT.
	* config/i386/i386-options.cc (ix86_option_override_internal):
	Also check whether the AVX512 flags is set when trying to reset.
	* config/i386/i386.h
	(PTA_SKYLAKE_AVX512): Add missing PTA_EVEX512.
	(PTA_ZNVER4): Ditto.
2023-11-24 10:02:14 +08:00
Nathaniel Shead d89903ff29 c++: check mismatching exports for class tags [PR98885]
Checks for exporting a declaration that was previously declared as not
exported is implemented in 'duplicate_decls', but this doesn't handle
declarations of classes. This patch adds these checks and slightly
adjusts the associated error messages for clarity.

	PR c++/98885

gcc/cp/ChangeLog:

	* decl.cc (duplicate_decls): Adjust error message.
	(xref_tag): Adjust error message. Check exporting decl that is
	already declared as non-exporting.

gcc/testsuite/ChangeLog:

	* g++.dg/modules/export-1.C: Adjust error messages. Remove
	xfails for working case. Add new test case.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
2023-11-24 12:30:07 +11:00
GCC Administrator 6fb55db0e1 Daily bump. 2023-11-24 00:17:53 +00:00
Nathaniel Shead 7572fa2b58 MAINTAINERS: Add myself to write after approval and DCO
ChangeLog:

	* MAINTAINERS: Add myself to write after approval and DCO

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
2023-11-24 10:43:54 +11:00
Hans-Peter Nilsson 4eafb9748b contrib/regression/btest-gcc.sh: Optionally handle XPASS.
Tests with keys that match both PASS, FAIL (or now
optionally XPASS), count as fail.  XPASSes were previously
ignored.  Handling them as FAIL seems the most useful
alternative, but not counting XPASSes may be deliberate.
It's also a matter of compatibility, so make it optional.

Attempts to use --handle-xpass-as-fail was previously
flagged as a usage error.  If you pass it now, on state with
previous mixed XPASS and PASS results but doesn't change in
this run, the XPASS is discovered as a (new) regression.
For new XPASSing tests, it's handled as a new FAIL.

	* btest-gcc.sh (--handle-xpass-as-fail): New option.
2023-11-24 00:21:31 +01:00
Hans-Peter Nilsson 071dadb728 contrib/regression/btest-gcc.sh: Simplify option handling.
* btest-gcc.sh (Option handling): Break out shifts from each
	option alternative.
2023-11-24 00:21:08 +01:00
Hans-Peter Nilsson 0ca1e90ae1 contrib/regression/btest-gcc.sh: Handle multiple options.
This is a long-standing bug: passing "-j --add-passes-despite-regression"
or "--add-passes-despite-regression -j" caused the second option to be
treated as TARGET; the first non-option parameter.

	* btest-gcc.sh (Option handling): Handle multiple options.
2023-11-24 00:20:42 +01:00
John David Anglin f33a4a7f74 hppa: Fix g++.dg/modules/bad-mapper-1.C on hpux
2023-11-23  John David Anglin  <danglin@gcc.gnu.org>

gcc/testsuite/ChangeLog:

	* g++.dg/modules/bad-mapper-1.C: Add hppa*-*-hpux* to dg-error
	"-:failed mapper handshake communication" targets.
2023-11-23 20:46:27 +00:00
John David Anglin 84e0ed920c hppa: Fix gcc.dg/analyzer/fd-4.c on hpux
2023-11-23  John David Anglin  <danglin@gcc.gnu.org>

gcc/testsuite/ChangeLog:

	* gcc.dg/analyzer/fd-4.c: Define _MODE_T on hpux.
2023-11-23 20:29:27 +00:00
John David Anglin 0632342e0d hppa: Export main in pr104869.C on hpux
This is needed to avoid a linker warning.

2023-11-23  John David Anglin  <danglin@gcc.gnu.org>

gcc/testsuite/ChangeLog:

	* g++.dg/pr104869.C: Export main on hpux.
2023-11-23 20:19:57 +00:00
Iain Sandoe 3a51dc3fc0 testsuite, lib: Re-allow mulitple function start labels.
The change applied in r14-5760-g2a46e0e7e20 changed the behaviour of
functions with assembly like:

bar:
__acle_se_bar:

Where both bar and __acle_se_bar are globals refering to the same
function body.  The old behaviour overrides 'bar' with '__acle_se_bar'
and the scan tests for that label.

The change here re-allows the override.

Case like this are not legal Mach-O (where two global symbols cannot
have the same address in the assembler output).  However, given the
constraints on the Mach-O scanning, it does not seem that it is
necessary to skip the change (any incorrect case should be easily
evident in the assembler).

gcc/testsuite/ChangeLog:

	* lib/scanasm.exp: Allow multiple function start symbols,
	taking the last as the function name.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
2023-11-23 19:54:04 +00:00
Harald Anlauf 7646b5d880 testsuite: fortran: fix invalid testcases (missing MOLD argument to NULL)
The Fortran standard requires that NULL() passed to an assumed-rank
dummy argument has a MOLD argument.

gcc/testsuite/ChangeLog:

	PR fortran/104819
	* gfortran.dg/assumed_rank_10.f90: Add MOLD argument to NULL().
	* gfortran.dg/assumed_rank_8.f90: Likewise.
2023-11-23 19:07:16 +01:00
Harald Anlauf 0c2ecfd4a2 Fortran: restrictions on integer arguments to SYSTEM_CLOCK [PR112609]
Fortran 2023 added restrictions on integer arguments to SYSTEM_CLOCK to
have a decimal exponent range at least as large as a default integer,
and that all integer arguments have the same kind type parameter.

gcc/fortran/ChangeLog:

	PR fortran/112609
	* check.cc (gfc_check_system_clock): Add checks on integer arguments
	to SYSTEM_CLOCK specific to F2023.
	* error.cc (notify_std_msg): Adjust to handle new features added
	in F2023.
	* gfortran.texi (_gfortran_set_options): Document GFC_STD_F2023_DEL,
	remove obsolete option GFC_STD_F2008_TS and fix enumeration values.
	* libgfortran.h (GFC_STD_F2023_DEL): Add and use in GFC_STD_OPT_F23.
	* options.cc (set_default_std_flags): Add GFC_STD_F2023_DEL.

gcc/testsuite/ChangeLog:

	PR fortran/112609
	* gfortran.dg/system_clock_1.f90: Add option -std=f2003.
	* gfortran.dg/system_clock_3.f08: Add option -std=f2008.
	* gfortran.dg/system_clock_4.f90: New test.
2023-11-23 19:07:16 +01:00
Georg-Johann Lay 9a3c40af7f AVR: PR target/86776: Implement CVE-2017-5753.
gcc/
	PR target/86776
	* config/avr/avr.cc (TARGET_HAVE_SPECULATION_SAFE_VALUE): Define
	to speculation_safe_value_not_needed.
2023-11-23 19:04:19 +01:00
John David Anglin 01412f0980 hppa: xfail scan-assembler-not check in g++.dg/cpp0x/initlist-const1.C
2023-11-23  John David Anglin  <danglin@gcc.gnu.org>

gcc/testsuite/ChangeLog:

	* g++.dg/cpp0x/initlist-const1.C: xfail scan-assembler-not
	check on hppa*-*-hpux*.
2023-11-23 17:53:49 +00:00
Jonathan Wakely 7a6a29c455 libstdc++: Define std::ranges::to for C++23 (P1206R7) [PR111055]
This adds the std::ranges::to functions for C++23. The rest of P1206R7
is not yet implemented, i.e. the new constructors taking the
std::from_range tag, and the new insert_range, assign_range, etc. member
functions. std::ranges::to works with the standard containers even
without the new constructors, so this is useful immediately.

The __cpp_lib_ranges_to_container feature test macro can be defined now,
because that only indicates support for the changes in <ranges>, which
are implemented by this patch. The __cpp_lib_containers_ranges macro
will be defined once all containers support the new member functions.

libstdc++-v3/ChangeLog:

	PR libstdc++/111055
	* include/bits/ranges_base.h (from_range_t): Define new tag
	type.
	(from_range): Define new tag object.
	* include/bits/version.def (ranges_to_container): Define.
	* include/bits/version.h: Regenerate.
	* include/std/ranges (ranges::to): Define.
	* testsuite/std/ranges/conv/1.cc: New test.
	* testsuite/std/ranges/conv/2_neg.cc: New test.
	* testsuite/std/ranges/conv/version.cc: New test.
2023-11-23 17:48:41 +00:00
Jonathan Wakely 0585daf7de libstdc++: Fix access error in __gnu_test::uneq_allocator
The operator== function is only a friend of the LHS argument, so cannot
access the private member of the RHS argument. Use the public accessor
instead.

libstdc++-v3/ChangeLog:

	* testsuite/util/testsuite_allocator.h (uneq_allocator): Fix
	equality operator for heterogeneous comparisons.
2023-11-23 17:44:26 +00:00
John David Anglin dc2dfda0ec Don't skip check for warning at line 411 in Wattributes.c on hppa*64*-*-*
2023-11-23  John David Anglin  <danglin@gcc.gnu.org>

gcc/testsuite/ChangeLog:

	* c-c++-common/Wattributes.c: Don't skip check for warning
	at line 411 in Wattributes.c on hppa*64*-*-*.
2023-11-23 17:39:15 +00:00
Marek Polacek 24592abd68 gcc: Introduce -fhardened
In <https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628748.html>
I proposed -fhardened, a new umbrella option that enables a reasonable set
of hardening flags.  The read of the room seems to be that the option
would be useful.  So here's a patch implementing that option.

Currently, -fhardened enables:

  -D_FORTIFY_SOURCE=3 (or =2 for older glibcs)
  -D_GLIBCXX_ASSERTIONS
  -ftrivial-auto-var-init=zero
  -fPIE  -pie  -Wl,-z,relro,-z,now
  -fstack-protector-strong
  -fstack-clash-protection
  -fcf-protection=full (x86 GNU/Linux only)

-fhardened will not override options that were specified on the command line
(before or after -fhardened).  For example,

     -D_FORTIFY_SOURCE=1 -fhardened

means that _FORTIFY_SOURCE=1 will be used.  Similarly,

      -fhardened -fstack-protector

will not enable -fstack-protector-strong.

Currently, -fhardened is only supported on GNU/Linux.

In DW_AT_producer it is reflected only as -fhardened; it doesn't expand
to anything.  This patch provides -Whardened, enabled by default, which
warns when -fhardened couldn't enable a particular option.  I think most
often it will say that _FORTIFY_SOURCE wasn't enabled because optimization
were not enabled.

gcc/c-family/ChangeLog:

	* c-opts.cc: Include "target.h".
	(c_finish_options): Maybe cpp_define _FORTIFY_SOURCE
	and _GLIBCXX_ASSERTIONS.

gcc/ChangeLog:

	* common.opt (Whardened, fhardened): New options.
	* config.in: Regenerate.
	* config/bpf/bpf.cc: Include "opts.h".
	(bpf_option_override): If flag_stack_protector_set_by_fhardened_p, do
	not inform that -fstack-protector does not work.
	* config/i386/i386-options.cc (ix86_option_override_internal): When
	-fhardened, maybe enable -fcf-protection=full.
	* config/linux-protos.h (linux_fortify_source_default_level): Declare.
	* config/linux.cc (linux_fortify_source_default_level): New.
	* config/linux.h (TARGET_FORTIFY_SOURCE_DEFAULT_LEVEL): Redefine.
	* configure: Regenerate.
	* configure.ac: Check if the linker supports '-z now' and '-z relro'.
	Check if -fhardened is supported on $target_os.
	* doc/invoke.texi: Document -fhardened and -Whardened.
	* doc/tm.texi: Regenerate.
	* doc/tm.texi.in (TARGET_FORTIFY_SOURCE_DEFAULT_LEVEL): Add.
	* gcc.cc (driver_handle_option): Remember if any link options or -static
	were specified on the command line.
	(process_command): When -fhardened, maybe enable -pie and
	-Wl,-z,relro,-z,now.
	* opts.cc (flag_stack_protector_set_by_fhardened_p): New global.
	(finish_options): When -fhardened, enable
	-ftrivial-auto-var-init=zero and -fstack-protector-strong.
	(print_help_hardened): New.
	(print_help): Call it.
	* opts.h (flag_stack_protector_set_by_fhardened_p): Declare.
	* target.def (fortify_source_default_level): New target hook.
	* targhooks.cc (default_fortify_source_default_level): New.
	* targhooks.h (default_fortify_source_default_level): Declare.
	* toplev.cc (process_options): When -fhardened, enable
	-fstack-clash-protection.  If flag_stack_protector_set_by_fhardened_p,
	do not warn that -fstack-protector not supported for this target.
	Don't enable -fhardened when !HAVE_FHARDENED_SUPPORT.

gcc/testsuite/ChangeLog:

	* gcc.misc-tests/help.exp: Test -fhardened.
	* c-c++-common/fhardened-1.S: New test.
	* c-c++-common/fhardened-1.c: New test.
	* c-c++-common/fhardened-10.c: New test.
	* c-c++-common/fhardened-11.c: New test.
	* c-c++-common/fhardened-12.c: New test.
	* c-c++-common/fhardened-13.c: New test.
	* c-c++-common/fhardened-14.c: New test.
	* c-c++-common/fhardened-15.c: New test.
	* c-c++-common/fhardened-2.c: New test.
	* c-c++-common/fhardened-3.c: New test.
	* c-c++-common/fhardened-4.c: New test.
	* c-c++-common/fhardened-5.c: New test.
	* c-c++-common/fhardened-6.c: New test.
	* c-c++-common/fhardened-7.c: New test.
	* c-c++-common/fhardened-8.c: New test.
	* c-c++-common/fhardened-9.c: New test.
	* gcc.target/i386/cf_check-6.c: New test.
2023-11-23 11:54:17 -05:00
Jose E. Marchesi 2eb833534c libgcc: mark __hardcfr_check_fail as always_inline
The function __hardcfr_check_fail in hardcfr.c is internal and static
inline.  It receives many arguments, which require more than five
registers to be passed in bpf-none-unknown targets.  BPF is limited to
that number of registers to pass arguments, and therefore libgcc fails
to build in that target.  This patch marks the function with the
always_inline attribute, fixing the bpf build.

Tested in bpf-unknown-none target and x86_64-linux-gnu host.

libgcc/ChangeLog:

	* hardcfr.c (__hardcfr_check_fail): Mark as always_inline.
2023-11-23 17:31:40 +01:00
Maciej W. Rozycki ba0869323e testsuite: Fix subexpressions with `scan-assembler-times'
We have an issue with `scan-assembler-times' handling expressions using
subexpressions as produced by capturing parentheses `()' in an odd way,
and one that is inconsistent with `scan-assembler', `scan-assembler-not',
etc.  The problem comes from calling `regexp' with `-inline -all', which
causes a list to be returned that would otherwise be placed in match
variables.

Consequently if we have say:

/* { dg-final { scan-assembler-times "\\s(foo|bar)\\s" 1 } } */

in a test case and there is a lone `foo' present in output being matched,
then our invocation of `regexp -inline -all' in `scan-assembler-times'
will return:

{ foo } foo

and that in turn will confuse our match count calculation as `llength'
will return 2 rather than 1, making the test fail even though `foo' was
only actually matched once.

It seems unclear why we chose to call `regexp' in such an odd way in the
first place just to figure out the number of matches.  The first version
of TCL that supports the `-all' option to `regexp' is 8.3, and according
to its documentation[1][2] `regexp' already returns the number of matches
found whenever `-all' has been used *unless* `-inline' has also been used.

Remove the `-inline' option then along with the `llength' invocation.

References:

[1] "Tcl Built-In Commands - regexp manual page",
    <https://www.tcl.tk/man/tcl8.2.3/TclCmd/regexp.html>

[2] "Tcl Built-In Commands - regexp manual page",
    <https://www.tcl.tk/man/tcl8.3/TclCmd/regexp.html>

	gcc/testsuite/
	* lib/scanasm.exp (scan-assembler-times): Remove the `-inline'
	option to `regexp' and the wrapping `llength' call.
2023-11-23 16:13:59 +00:00
Maciej W. Rozycki 6ab2ae97fc AArch64/testsuite: Use non-capturing parentheses with ccmp_1.c
Use non-capturing parentheses for the subexpressions used with
`scan-assembler-times', to avoid a quirk with double-counting.

	gcc/testsuite/
	* gcc.target/aarch64/ccmp_1.c: Use non-capturing parentheses
	with `scan-assembler-times'.
2023-11-23 16:13:58 +00:00
Maciej W. Rozycki a74b9be0bb ARM/testsuite: Use non-capturing parentheses with pr53447-5.c
Use non-capturing parentheses for the subexpressions used with
`scan-assembler-times', to avoid a quirk with double-counting.

	gcc/testsuite/
	* gcc.target/arm/pr53447-5.c: Use non-capturing parentheses with
	`scan-assembler-times'.
2023-11-23 16:13:58 +00:00
Christophe Lyon b9dbdefac6 arm: [MVE intrinsics] Add default clause to full_width_access::memory_vector_mode
My recent commit 0c2037d9d9 added a
switch statement lacking a default clause, leading to warnings or
errors when building with --enable-werror-always.

Fix by adding an empty default.

Committed as obvious.

2023-11-23  Christophe Lyon  <christophe.lyon@linaro.org>

	gcc/
	* config/arm/arm-mve-builtins-functions.h
	(full_width_access::memory_vector_mode): Add default clause.
2023-11-23 15:54:32 +00:00
Uros Bizjak b2d17bdd45 i386: Wrong code with __builtin_parityl [PR112672]
gen_parityhi2_cmp instruction clobbers its input operand, so use
a temporary register in the call to gen_parityhi2_cmp.

	PR target/112672

gcc/ChangeLog:

	* config/i386/i386.md (parityhi2):
	Use temporary register in the call to gen_parityhi2_cmp.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/pr112672.c: New test.
2023-11-23 16:17:57 +01:00
Uros Bizjak 2f3f8952ff i386: Fix ICE with -mforce-indirect-call and -fsplit-stack [PR89316]
With the above two options, use a temporary register regno (as returned
from split_stack_prologue_scratch_regno) as an indirect call scratch
register to hold __morestack function address.  On 64-bit targets, two
temporary registers are always available, so load the function addres in
%r11 and call __morestack_large_model with its one-argument-register value
rn %r10.  On 32-bit targets, bail out with a "sorry" if the temporary
register can not be obtained.

On 32-bit targets, also emit PIC sequence that re-uses the obtained indirect
call scratch register before moving the function address to it.  We can
not set up %ebx PIC register in this case, but __morestack is prepared
for this situation and sets it up by itself.

	PR target/89316

gcc/ChangeLog:

	* config/i386/i386.cc (ix86_expand_split_stack_prologue): Obtain
	scratch regno when flag_force_indirect_call is set.  On 64-bit
	targets, call __morestack_large_model when  flag_force_indirect_call
	is set and on 32-bit targets with -fpic, manually expand PIC sequence
	to call __morestack.  Move the function address to an indirect
	call scratch register.

gcc/testsuite/ChangeLog:

	* g++.target/i386/pr89316.C: New test.
	* gcc.target/i386/pr112605-1.c: New test.
	* gcc.target/i386/pr112605-2.c: New test.
	* gcc.target/i386/pr112605.c: New test.
2023-11-23 16:08:50 +01:00
Sebastian Huber 8674d70ce3 gcov: No atomic ops for -fprofile-update=single
gcc/ChangeLog:

	PR tree-optimization/112678

	* tree-profile.cc (tree_profiling): Do not use atomic operations
	for -fprofile-update=single.
2023-11-23 15:54:43 +01:00
Juergen Christ 466b100e5f s390: implement flags output
Implement flags output for inline assemblies.  Only use one output constraint
that captures the whole condition code.  No breakout into different condition
codes is allowed.  Also, only one condition code variable is allowed.

Add further logic to canonicalize various cases where we combine different
cases of possible condition codes.

gcc/ChangeLog:

	* config/s390/s390-c.cc (s390_cpu_cpp_builtins): Define
	__GCC_ASM_FLAG_OUTPUTS__.
	* config/s390/s390.cc (s390_canonicalize_comparison): More
	UNSPEC_CC_TO_INT cases.
	(s390_md_asm_adjust): Implement flags output.
	* config/s390/s390.md (ccstore4): Allow mask operands.
	* doc/extend.texi: Document flags output.

gcc/testsuite/ChangeLog:

	* gcc.target/s390/ccor.c: New test.

Signed-off-by: Juergen Christ <jchrist@linux.ibm.com>
2023-11-23 15:32:17 +01:00
Juergen Christ 111b5555c7 s390: split int128 load
Issue two loads when using GPRs instead of one load-multiple.

Bootstrapped and tested on s390.  OK for mainline?

gcc/ChangeLog:

	* config/s390/s390.md: Split TImode loads.

gcc/testsuite/ChangeLog:

	* gcc.target/s390/int128load.c: New test.

Signed-off-by: Juergen Christ <jchrist@linux.ibm.com>
2023-11-23 15:32:00 +01:00
Juergen Christ 2add85eeb0 s390: Fix ICE in testcase pr89233
When using GNU vector extensions, an access outside of the vector size
caused an ICE on s390.  Fix this by aligning with the vec_extract
builtin, i.e., computing constant index modulo number of lanes.

Fixes testcase gcc.target/s390/pr89233.c.

gcc/ChangeLog:

	* config/s390/vector.md: (*vec_extract) Fix.

Signed-off-by: Juergen Christ <jchrist@linux.ibm.com>
2023-11-23 15:30:48 +01:00
Di Zhao 746344dd53 swap ops in reassoc to reduce cross backedge FMA
Previously for ops.length >= 3, when FMA is present, we don't
rank the operands so that more FMAs can be preserved. But this
brings more FMAs with loop dependency, which lead to worse
performance on some targets.

Rank the oprands (set width=2) when:
1. avoid_fma_max_bits is set.
2. And loop dependent FMA sequence is found.

In this way, we don't have to discard all the FMA candidates
in the bad shaped sequence in widening_mul, instead we can keep
fewer FMAs without loop dependency.

With this patch, there's about 2% improvement in 510.parest_r
1-copy run on ampere1 (with "-Ofast -mcpu=ampere1 -flto
--param avoid-fma-max-bits=512").

PR tree-optimization/110279

gcc/ChangeLog:

	* tree-ssa-reassoc.cc (get_reassociation_width): check
	for loop dependent FMAs.
	(reassociate_bb): For 3 ops, refine the condition to call
	swap_ops_for_binary_stmt.

gcc/testsuite/ChangeLog:

	* gcc.dg/pr110279-1.c: New test.
2023-11-23 20:56:31 +08:00