ARM platforms have timing issues with pip_stress:
Hello,
pip_stress works out of the box on my x86 based laptop, but
doesn't work on ARM devices, returned 'no inversion incurred'.
Follow the comment to increase usleep value, 2500 worked for
pandaboard and 3000 worked for Beaglebone Black board.
I propose that increase the usleep value to 3500 from upstream,
so that we can use pip_stress right out of the box.
Rather than hardcode the usleep value used by pip_stress, I made
the command line option --usleep which takes a microsecond value
that defaults to 500us.
Reported-by: Chase Qi <chase.qi@linaro.org>
Signed-off-by: Clark Williams <clark.williams@gmail.com>
Currently, the scheduling class is configured on a global
level. It is possible to run the test either with SCHED_FIFO
or SCHED_RR. All threads run then with the same configuration
except sched_priority is different.
By storing the scheduling attributes per thread we will be able
to use different scheduler classes at the same time. The aim is
to use SCHED_DEADLINE for the high priority thread.
First thing to get there is to introduce low_sa, med_sa, high_sa
and admin_sa. They are configured using the global policy variable
on default. Either using SCHED_FIFO or SCHED_RR. The user
can though use --sched command line options to configure each
thread seperately. E.g.
Starting PI Stress Test
Number of thread groups: 1
Duration of test run: infinite
Number of inversions per group: unlimited
Admin thread SCHED_FIFO priority 4
1 groups of 3 threads will be created
High thread SCHED_DEADLINE runtime 100000 deadline 200000 period 200000
Med thread SCHED_FIFO priority 2
Low thread SCHED_FIFO priority 1
Current Inversions: 2446249
Stopping test
Terminated
Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Until we have a proper libc implementation we maintain a simple
version of it. We this new API we are able to use SCHED_DEADLINE.
This is shamelessly stolen from Dario Faggioli's libdl.
Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Cc: Dario Faggioli <raistlin@linux.it>
In order to be able to use some of the rt-utils.h function we need
to get rid of our own info() & friends implementation.
Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
When I boot my 8 core i7 laptop with the maxcpus=4 kernel boot flag,
cyclictest -S runs 8 threads. This patch makes it only use the number
of online cpus instead.
Signed-off-by: Joakim Hernberg <jhernberg@alchemy.lu>
Signed-off-by: John Kacur <jkacur@redhat.com>
Some people running cyclictest on laptops don't want to automatically
take advantage of the trick that prevents the power management to
transition to high cstates, since it eats up their battery power.
Allow them to suppress this feature with --laptop
This will result in power latency results of course.
Feature-requested-by: Joakim Hernberg <jhernberg@alchemy.lu>
Signed-off-by: John Kacur <jkacur@redhat.com>
When I boot my 8 core i7 laptop with the maxcpus=4 kernel boot flag,
cyclictest -S runs 8 threads. This patch makes it only use the number
of online cpus instead.
Signed-off-by: Joakim Hernberg <jhernberg@alchemy.lu>
Signed-off-by: John Kacur <jkacur@redhat.com>
Some people running cyclictest on laptops don't want to automatically
take advantage of the trick that prevents the power management to
transistion to high cstates, since it eats up their battery power.
Allow them to suppress this feature with --laptop
This will result in power latency results of course.
Feature-requested-by: Joakim Hernberg <jhernberg@alchemy.lu>
Signed-off-by: John Kacur <jkacur@redhat.com>
At some point in the history of cyclictest, a number of short options
were removed and changed to long only options. However the display_help
was not updated to reflect this and indicates short options that
no longer exist. Fix this. I also found a long option that wasn't listed
at all and added that.
Signed-off-by: John Kacur <jkacur@redhat.com>
cyclictest can be run from other tools such as rteval
in order to get current status on long runs, SIGUSR1 is sent to
cyclictest and caught by function sighand()
This creates difficulties for rteval when parsing cyclictest output, so
change the output to stderr.
Note, a RFC was sent out on Apr.15 2014 entitled
"RFC: SIGUSR1 to stderr"
to: RT <linux-rt-users@vger.kernel.org>
cc: Carsten Emde <C.Emde@osadl.org>,
Thomas Gleixner <tglx@linutronix.de>,
Clark Williams <williams@redhat.com>
Since I didn't receive any replies, I'm assumin there are no objections
Signed-off-by: John Kacur <jkacur@redhat.com>
In set_latency_target() there are some paths that don't print an error
message even when a write of 0 to /dev/cpu_dma_latency fails.
This patch does the following
- always print an error message if the write to /dev/cpu_dma_latency
fails
- Fix the error check with the write call. (a return of 0 or -1 indicate
problems
- rename ret to err since this function is void and returns no value
- use err_msg_n instead of printf (which also prints to stderr)
Signed-off-by: John Kacur <jkacur@redhat.com>
In rt-tests we try to use const where appropriate for read-only, but
we need to tell the compiler we are intentionally discarding const
when calling library functions that expect char *
Signed-off-by: John Kacur <jkacur@redhat.com>
Ran 2to3 on hwlatdetect.py and checked in the result. Tested
on F20 system running 3.12.14-rt23 with both python2 and python3.
Signed-off-by: Clark Williams <clark.williams@gmail.com>
on ARM I'm seeing output like:
cyclicte-623 0....... 19619418us+: tracing_mark_write: hit latency threshold (2000 > 2097)
That's because of a format mismatch in
tracemark("hit latency threshold (%d > %d)", diff, tracelimit);
diff is a u64 and tracelimit an int. So on ARM the string is passed in r0,
tracelimit in r1 and diff in r2+r3. vsnprintf used in tracemark only
expects two ints passed and so only uses r1 and r2 yielding the permutation
in the output.
This patch also adds a gcc attribute to tracemark that helps catching
similar bugs. In this case just adding the attribute but not touching
the call site, would result in:
src/cyclictest/cyclictest.c: In function ‘timerthread’:
src/cyclictest/cyclictest.c:899:4: warning: format ‘%d’ expects argument of type ‘int’, but argument 2 has type ‘uint64_t’ [-Wformat]
---
Hello
after some chatting with Clark and John I dropped the c99 stuff and added the
attribute annotation.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Clark Williams <clark.williams@gmail.com>
Hello,
While playing around with hackbench I discovered that I would sometimes
get an enormous time reported, even if the run time would be less than a
second or so. The problem was that the struct timeval start was not
initialized until after all children have been created. But if the
program receives a signal before this is done, the start time is left
uninitialized.
I propose that in such situations an error message be displayed, like
the following patch does.
Please let me know if this is acceptable.
Regards,
/Ciprian
Signed-off-by: Clark Williams <williams@redhat.com>
e.g.
cyclictest -a4,6-8 -t5
will use 5 threads, assigned round-robin to the set of CPUs {4,6,7,8}.
CPU 4 will get threads 1 and 5, CPU 6 gets thread 2, CPU 7 gets thread 3, and
CPU 8 gets thread 4.
As explained in the updated manpage, libnuma >= v2 is required for these
arbitrary CPU sets. With libnuma v1, the -a option behaves as before. As
before, compiling without libnuma is supported. The command usage help is fixed
up at compile time to always show the correct usage of the -a option.
Also note that, since numa_parse_cpustring_all() wasn't available in early
libnuma v2 versions, we use numa_parse_cpustring(). This means you'll have to
use taskset in some cases (isolcpus kernel parameter) to add the desired CPUs to
the set of allowed cores, e.g.:
taskset -c4-6 cyclictest -a4-6
Tested with out libnuma (numactl), and with versions 1.0.2 and 2.0.9-rc3.
Signed-off-by: Aaron Fabbri <ajfabbri@gmail.com>
(cherry picked from commit 5375ab86e77881d8043e5e309bb8daf5a84cc05f)
Signed-off-by: Clark Williams <clark.williams@gmail.com>
These changes make the align option truly optional as claimed.
1. Rename disaligned to offset for readability.
2. Fix the aligned option so that if no optional argument is given,
the offset defaults to 0
3. Fix some white space problems as reported by checkpatch.pl in the kernel
Signed-off-by: John Kacur <jkacur@redhat.com>
This patch provides and additional -A/--align flag to cyclictest to align
thread wakeup times of all threads as closly defined as possible.
When running multiple threads in cyclictest (-S or -t # option) the threads
are launched in an unsynchronized manner. Basically the creation order and
time for thread creation determines the start time. For provoking a maximum
congestion situation (e.g. cache evictions) and to improve reproducibility
or run conditions the start time should be defined distances appart. The
well defined distance is implemented as a offset parameter to -A/--align
and will offset each threads start time by the parameter * the sequentially
assigned thread number (par->tnum), together with the -d0 (distance in the
intervals of the individual threads) this alignment option allows to get
the thread wakeup times as closely synchronized as possible.
The method to sync is simply that the thread with par->tnum == 0 is chosen
to set a globally shared timestamp, and all other threads use this timestamp
as their starting time rather than each calling clock_gettime() at startup.
To ensure synchronization of the thread startup the setting of the global
time is guarded by pthread_barriers.
Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at>
Reviewed-by: Andreas Platschek <andreas.platschek@opentech.at>
Signed-off-by: Clark Williams <clark.williams@gmail.com>
Change return value from option parsing to be enumerated type
rather than a character. Hopefully this will clean up the option
handling a bit and not confuse me when I come back to add yet
another option to cyclictest.
Signed-off-by: Clark Williams <clark.williams@gmail.com>
Commit ad27df7 ("Reimplement better child tracking and improve error
handling") changed the way of reporting pid/error after creating a
child. It will return an union which is a mix pid_t, pthread_t and a
signed long long for errors.
Now on 32bit x86 both pid_t and pthread_t are four byte in size and are
stored in the first 4 bytes. Now if the most significant bit of the long
long variable happens to be set by chance (because nobody really
initializes the variable here) then error variable will be negative. On
little endian machines the assignment of pid or threadid won't reset the
sign bit and you see this:
| Running in process mode with 10 groups using 40 file descriptors each (== 400 tasks)
| Each sender will pass 100 messages of 100 bytes
| 0 children started. Expected 40
| sending SIGTERM to all child processes
| signaling 0 worker threads to terminate
| Creating workers (error: Success)
A machine with proper endian handlig (that is big endian) would reset
the sign bit during the assignment of pid and I would not have to make
this patch :)
While here, I make create_worker() since it is not used outside of this
file.
Cc: David Sommerseth <davids@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Clark Williams <clark.williams@gmail.com>
Add the --notrace/-A option, intended to be used in conjunction
with the -b option. This will cause cyclictest to exit when a
threshold is hit, but will not perform any tracing operations,
allowing more sophisticated tracing to be done externally.
Signed-off-by: Clark Williams <clark.williams@gmail.com>
This code adds the -F/--fifo option to cyclictest. Using the
--fifo <path> option will cause cyclictest to create a named
fifo at <path> and will dump the current run statistics to that
fifo when it is opened an read.
Signed-off-by: Clark Williams <clark.williams@gmail.com>
Huge latencies are observed (close to 1 second) when certain
options are used in cyclictest.
The problem was 1st introduced at commit da4956cbca
("use interval on first loop instead of 1 second"). It removed
the 1 second first timing loop out of the main path in cyclictest
but left it in two other paths, namely the ones triggered by
these two options:
-r --relative use relative timer instead of absolute
-s --system use sys_nanosleep and sys_setitimer
which in turn causes the huge latencies of close to 1 second to
be reported by cyclictest with certain uses of those two options.
Here we extend the original commit to remove the 1 second
hardcoded timer values from the RELTIME and ITIMER options, by
simply using the actual interval provided instead.
Signed-off-by: Jim Somerville <Jim.Somerville@windriver.com>
Cc: Clark Williams <williams@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: John Kacur <jkacur@redhat.com>
Don't tag copies of files in BUILD created when building an rpm
Without this change tags finds both copies, eg: for tag cyclictest.c
# pri kind tag file
1 F F cyclictest.c BUILD/rt-tests/src/cyclictest/cyclictest.c
1
2 F F cyclictest.c src/cyclictest/cyclictest.c
With this change, only the later one is found
Signed-off-by: John Kacur <jkacur@redhat.com>