Sample_Results.txt
Analysis of Sample Results for Dispatch_Compare |
The following results were obtained by running on Mac OS X v10.6 Snow Leopard: |
$ Dispatch_Compared -t 60 -m 1000000 -f 16 |
Benchmark averaged over: 60 seconds |
CPU speed: 2.66 GHz |
Iterate maximum of: 1000000 times |
Work function folded: 16 times |
Note that the actual results may vary greatly depending on the configuration of the machine it is run on. |
There are several salient points to observe about these results: |
1. The basic act of queuing is much faster than forking a new thread, over 100X when doing 8 or more |
2. For this kind of looping, dispatch_apply is always faster than manually creating blocks and queues. In addition, if there is only a single iteration, dispatch_apply will use a fast path to run on the current thread with virtually no overhead. |
3. OpenMP has a large initial overhead, probably due to always spinning up at least one (new) thread. GCD avoids that problem by using a system-wide thread pool, but otherwise they perform very similarly for this type of problem. |
4. Unlike with threads, the bulk of GCD's time is usually spent in user space, enabling more efficient scheduling. |
5. On this machine, using concurrency (via dispatch_apply) becomes faster than a simple "for loop" when the total work takes around 40 microseconds. Note that the crossover point could occur sooner with appropriate "striding" of the computation. |
6. Creating lots of queues--though bad programming practice--is nonetheless quite cheap, and for small workloads is actually faster than using a single concurrent queue (presumably since all the queues run on the parent thread). However, it is never the optimal solution. Use a single dispatch_async for small workloads, and a concurrent queue for large ones. |
7. Explicitly creating threads is quite expensive: around 20 microseconds on this machine, much more if you're creating lots of them. In a real application, you would need to make sure you didn't create more than absolutely necessary, and the "right" number would vary depending on the hardware involved and what other applications were being run. |
Note that if you increase the number of folds (and thus the computation time of the work function) the crossover points from serial to parallel will occur much sooner. Also, the specific crossover points may vary dramatically on different machines. In most cases, however, the time spend on short iterations is inconsequential, so you should worry more about optimizing for when there are lots of iterations or large amounts of calculation per iteration. |
Sample Results for Dispatch_Compare |
$ /Users/Shared/Build/Release/Dispatch_Compared -t 60 -m 1000000 -f 16 |
Benchmark averaged over: 60 seconds |
CPU speed: 2.66 GHz |
Iterate maximum of: 1000000 times |
Work function folded: 16 times |
ASYNCHRONOUS: Microseconds to *initiate* execution (avg. over 60 seconds) |
µsecs±error/1 = WALL(µs)±error [+-rate] USER (µs) + SYS (µs) [overhead] |
1.15± 0.43/alloc = 1.15±0.43 [ +0%] 0.483u + 0.6628s [ 0%] |
2.19± 0.18/array = 2.19±0.18 [ -48%] 1.465u + 0.6663s [ 86%] |
2.12± 0.22/dsptch_f = 2.12±0.22 [ -46%] 1.354u + 4.102s [ 376%] |
2.44± 0.57/dispatch = 2.44±0.57 [ -53%] 1.623u + 4.47s [ 432%] |
18.71± 4.52/fork = 18.7±4.5 [ -94%] 3.183u + 18.29s [ 1,774%] |
µsecs±error/2 = WALL(µs)±error [+-rate] USER (µs) + SYS (µs) [overhead] |
0.62± 0.19/alloc = 1.24±0.37 [ +0%] 0.5449u + 0.695s [ 0%] |
1.57± 0.09/array = 3.15±0.17 [ -61%] 2.364u + 0.7167s [ 148%] |
1.45± 0.18/dsptch_f = 2.91±0.37 [ -57%] 2.089u + 5.282s [ 494%] |
2.01± 0.26/dispatch = 4.02±0.53 [ -69%] 3.194u + 5.349s [ 589%] |
20.27± 6.00/fork = 40.5±12 [ -97%] 12.82u + 52.21s [ 5,145%] |
µsecs±error/4 = WALL(µs)±error [+-rate] USER (µs) + SYS (µs) [overhead] |
0.35± 0.15/alloc = 1.42±0.59 [ +0%] 0.7u + 0.7147s [ 0%] |
1.10± 0.16/array = 4.41±0.65 [ -68%] 3.615u + 0.7172s [ 206%] |
0.86± 0.37/dsptch_f = 3.44±1.5 [ -59%] 2.605u + 5.075s [ 443%] |
1.28± 0.18/dispatch = 5.1±0.71 [ -72%] 4.238u + 5.362s [ 579%] |
43.56±10.65/fork = 174±43 [ -99%] 39.68u + 342.8s [ 26,937%] |
µsecs±error/8 = WALL(µs)±error [+-rate] USER (µs) + SYS (µs) [overhead] |
0.22± 0.15/alloc = 1.79±1.2 [ +0%] 1.028u + 0.7636s [ 0%] |
0.92± 0.03/array = 7.35±0.27 [ -76%] 6.53u + 0.7135s [ 304%] |
0.58± 0.06/dsptch_f = 4.62±0.5 [ -61%] 3.798u + 5.202s [ 402%] |
0.90± 0.18/dispatch = 7.18±1.4 [ -75%] 6.307u + 5.137s [ 539%] |
98.19±11.87/fork = 785±95 [ -100%] 104u + 1,463s [ 87,351%] |
µsecs±error/16 = WALL(µs)±error [+-rate] USER (µs) + SYS (µs) [overhead] |
0.17± 3.94/alloc = 2.74±63 [ +0%] 1.768u + 0.8645s [ 0%] |
0.80± 0.07/array = 12.9±1.2 [ -79%] 12.03u + 0.7144s [ 384%] |
0.42± 0.04/dsptch_f = 6.79±0.64 [ -60%] 5.975u + 5.219s [ 325%] |
0.68± 0.12/dispatch = 10.9±2 [ -75%] 10.07u + 5.222s [ 481%] |
162.99±12.96/fork = 2.61e+03±2.1e+02 [ -100%] 234.7u + 4,240s [169,892%] |
µsecs±error/32 = WALL(µs)±error [+-rate] USER (µs) + SYS (µs) [overhead] |
0.16± 4.99/alloc = 4.96±1.6e+02 [ +0%] 3.353u + 1.068s [ 0%] |
0.71± 0.04/array = 22.7±1.2 [ -78%] 21.84u + 0.7253s [ 411%] |
0.33± 0.04/dsptch_f = 10.6±1.3 [ -53%] 13.48u + 5.648s [ 333%] |
0.79± 0.23/dispatch = 25.2±7.2 [ -80%] 29.05u + 6.58s [ 706%] |
236.52±13.09/fork = 7.57e+03±4.2e+02 [ -100%] 503.1u + 1.111e+04s [262,671%] |
µsecs±error/64 = WALL(µs)±error [+-rate] USER (µs) + SYS (µs) [overhead] |
0.22±11.59/alloc = 14.4±7.4e+02 [ +0%] 6.385u + 1.526s [ 0%] |
0.66± 0.24/array = 42.4±15 [ -66%] 41.55u + 0.7312s [ 435%] |
0.34± 0.04/dsptch_f = 21.8±2.6 [ -34%] 31.65u + 6.08s [ 377%] |
0.63± 0.15/dispatch = 40.1±9.7 [ -64%] 47.4u + 6.196s [ 577%] |
295.36±15.24/fork = 1.89e+04±9.8e+02 [ -100%] 1,039u + 2.622e+04s [344,415%] |
µsecs±error/128 = WALL(µs)±error [+-rate] USER (µs) + SYS (µs) [overhead] |
0.27±11.49/alloc = 34.2±1.5e+03 [ +0%] 12.19u + 2.231s [ 0%] |
0.64± 0.04/array = 82.1±5.7 [ -58%] 81.22u + 0.7739s [ 469%] |
0.45± 0.07/dsptch_f = 57.8±9.4 [ -41%] 80.36u + 6.816s [ 504%] |
0.73± 0.28/dispatch = 93.2±35 [ -63%] 114.4u + 6.681s [ 739%] |
311.40±14.49/fork = 3.99e+04±1.9e+03 [ -100%] 2,084u + 5.462e+04s [393,081%] |
µsecs±error/256 = WALL(µs)±error [+-rate] USER (µs) + SYS (µs) [overhead] |
0.12± 1.75/alloc = 30.9±4.5e+02 [ +0%] 23.92u + 3.6s [ 0%] |
0.62± 0.49/array = 158±1.3e+02 [ -80%] 157u + 0.8497s [ 474%] |
0.46± 0.07/dsptch_f = 116±19 [ -73%] 157.5u + 11.05s [ 512%] |
0.68± 0.58/dispatch = 173±1.5e+02 [ -82%] 208.5u + 9.335s [ 692%] |
312.23±29.57/fork = 7.99e+04±7.6e+03 [ -100%] 3,948u + 1.085e+05s [408,529%] |
µsecs±error/512 = WALL(µs)±error [+-rate] USER (µs) + SYS (µs) [overhead] |
0.14± 2.36/alloc = 74±1.2e+03 [ +0%] 47.22u + 6.607s [ 0%] |
0.62± 0.04/array = 317±21 [ -77%] 315.3u + 0.9726s [ 488%] |
0.43± 0.05/dsptch_f = 222±26 [ -67%] 306.1u + 14.11s [ 495%] |
0.70± 0.86/dispatch = 359±4.4e+02 [ -79%] 434.4u + 12.66s [ 731%] |
341.74±15.05/fork = 1.75e+05±7.7e+03 [ -100%] 8,332u + 2.336e+05s [449,352%] |
µsecs±error/1,024 = WALL(µs)±error [+-rate] USER (µs) + SYS (µs) [overhead] |
0.13± 1.37/alloc = 132±1.4e+03 [ +0%] 94.01u + 11.95s [ 0%] |
0.62± 0.30/array = 630±3.1e+02 [ -79%] 627.5u + 1.068s [ 493%] |
0.45± 0.06/dsptch_f = 463±64 [ -72%] 640.3u + 23.73s [ 527%] |
0.69± 1.64/dispatch = 709±1.7e+03 [ -81%] 843.7u + 21.28s [ 716%] |
360.52±15.05/fork = 3.69e+05±1.5e+04 [ -100%] 1.658e+04u + 4.848e+05s [473,091%] |
µsecs±error/2,048 = WALL(µs)±error [+-rate] USER (µs) + SYS (µs) [overhead] |
0.14± 1.10/alloc = 284±2.3e+03 [ +0%] 189u + 23.51s [ 0%] |
0.61± 0.06/array = 1.25e+03±1.3e+02 [ -77%] 1,251u + 1.489s [ 489%] |
0.40± 0.06/dsptch_f = 813±1.1e+02 [ -65%] 1,142u + 39.8s [ 456%] |
0.63± 2.49/dispatch = 1.29e+03±5.1e+03 [ -78%] 1,536u + 34.76s [ 639%] |
386.28±15.08/fork = 7.91e+05±3.1e+04 [ -100%] 3.328e+04u + 1.016e+06s [493,535%] |
µsecs±error/4,096 = WALL(µs)±error [+-rate] USER (µs) + SYS (µs) [overhead] |
0.14± 0.89/alloc = 582±3.7e+03 [ +0%] 403.9u + 45.66s [ 0%] |
0.61± 0.13/array = 2.52e+03±5.2e+02 [ -77%] 2,507u + 3.119s [ 458%] |
0.37± 0.06/dsptch_f = 1.51e+03±2.5e+02 [ -61%] 2,129u + 52.21s [ 385%] |
0.53± 0.09/dispatch = 2.18e+03±3.6e+02 [ -73%] 2,759u + 52.36s [ 526%] |
419.85± 3.02/fork = 1.72e+06±1.2e+04 [ -100%] 6.283e+04u + 2.145e+06s [491,064%] |
µsecs±error/8,192 = WALL(µs)±error [+-rate] USER (µs) + SYS (µs) [overhead] |
0.13± 0.51/alloc = 1.03e+03±4.1e+03 [ +0%] 829.2u + 90.52s [ 0%] |
0.62± 0.13/array = 5.07e+03±1.1e+03 [ -80%] 5,044u + 6.089s [ 449%] |
0.36± 0.06/dsptch_f = 2.92e+03±5.3e+02 [ -65%] 4,128u + 93.91s [ 359%] |
0.51± 0.08/dispatch = 4.17e+03±6.5e+02 [ -75%] 5,355u + 92.06s [ 492%] |
511.74±52.66/fork = 4.19e+06±4.3e+05 [ -100%] 2.16e+05u + 4.833e+06s [548,833%] |
µsecs±error/16,384 = WALL(µs)±error [+-rate] USER (µs) + SYS (µs) [overhead] |
0.13± 0.25/alloc = 2.16e+03±4.1e+03 [ +0%] 1,501u + 450.9s [ 0%] |
0.64± 0.04/array = 1.05e+04±7e+02 [ -79%] 1.028e+04u + 218.8s [ 438%] |
0.31± 0.09/dsptch_f = 5.04e+03±1.5e+03 [ -57%] 7,015u + 510.6s [ 286%] |
0.43± 0.23/dispatch = 7.1e+03±3.7e+03 [ -70%] 8,921u + 397s [ 377%] |
µsecs±error/32,768 = WALL(µs)±error [+-rate] USER (µs) + SYS (µs) [overhead] |
0.14± 0.23/alloc = 4.52e+03±7.4e+03 [ +0%] 2,944u + 922.4s [ 0%] |
0.63± 0.01/array = 2.05e+04±4.6e+02 [ -78%] 2.028e+04u + 231.7s [ 431%] |
0.34± 0.07/dsptch_f = 1.12e+04±2.2e+03 [ -60%] 1.513e+04u + 1,910s [ 341%] |
0.47± 0.08/dispatch = 1.55e+04±2.6e+03 [ -71%] 1.918e+04u + 2,033s [ 449%] |
µsecs±error/65,536 = WALL(µs)±error [+-rate] USER (µs) + SYS (µs) [overhead] |
0.43± 1.57/alloc = 2.8e+04±1e+05 [ +0%] 5,719u + 1,744s [ 0%] |
0.66± 0.01/array = 4.33e+04±7.9e+02 [ -35%] 4.251e+04u + 745.4s [ 480%] |
0.33± 0.06/dsptch_f = 2.18e+04±3.8e+03 [ +28%] 2.879e+04u + 5,195s [ 355%] |
0.46± 0.06/dispatch = 3.01e+04±3.9e+03 [ -7%] 3.691e+04u + 5,419s [ 467%] |
µsecs±error/131,072 = WALL(µs)±error [+-rate] USER (µs) + SYS (µs) [overhead] |
1.42±31.87/alloc = 1.86e+05±4.2e+06 [ +0%] 1.148e+04u + 5,315s [ 0%] |
0.67± 0.01/array = 8.8e+04±9.1e+02 [ +112%] 8.706e+04u + 935.2s [ 424%] |
0.32± 0.05/dsptch_f = 4.18e+04±6.5e+03 [ +346%] 5.484e+04u + 1.144e+04s [ 295%] |
0.45± 0.05/dispatch = 5.86e+04±7.1e+03 [ +218%] 7.077e+04u + 1.204e+04s [ 393%] |
µsecs±error/262,144 = WALL(µs)±error [+-rate] USER (µs) + SYS (µs) [overhead] |
0.85±13.37/alloc = 2.24e+05±3.5e+06 [ +0%] 2.274e+04u + 7,169s [ 0%] |
0.72± 0.17/array = 1.89e+05±4.5e+04 [ +18%] 1.811e+05u + 2,961s [ 515%] |
0.31± 0.04/dsptch_f = 8.21e+04±1.1e+04 [ +172%] 1.053e+05u + 2.606e+04s [ 339%] |
0.46± 0.05/dispatch = 1.21e+05±1.4e+04 [ +85%] 1.4e+05u + 3.168e+04s [ 474%] |
µsecs±error/524,288 = WALL(µs)±error [+-rate] USER (µs) + SYS (µs) [overhead] |
0.22± 0.43/alloc = 1.15e+05±2.3e+05 [ +0%] 4.556e+04u + 8,460s [ 0%] |
0.90± 0.29/array = 4.73e+05±1.5e+05 [ -76%] 4.315e+05u + 4,137s [ 706%] |
0.31± 0.04/dsptch_f = 1.64e+05±1.9e+04 [ -30%] 2.068e+05u + 5.53e+04s [ 385%] |
0.46± 0.09/dispatch = 2.42e+05±4.8e+04 [ -52%] 2.638e+05u + 7.415e+04s [ 526%] |
SYNCHRONOUS: Microseconds to *complete* execution (avg. over 60 seconds) |
µsecs±error/1 = WALL(µs)±error [+-rate] USER (µs) + SYS (µs) [overhead] |
3.59± 0.28/loop = 3.59±0.28 [ +0%] 2.873u + 0.7078s [ 0%] |
3.64± 0.34/apply = 3.64±0.34 [ -1%] 2.92u + 0.6984s [ 1%] |
26.03± 5.81/serial = 26±5.8 [ -86%] 13.34u + 19.47s [ 816%] |
31.70± 8.08/parallel = 31.7±8.1 [ -89%] 13.6u + 28.5s [ 1,076%] |
28.22± 6.09/queues = 28.2±6.1 [ -87%] 15.53u + 19.61s [ 881%] |
58.37± 9.58/openmp = 58.4±9.6 [ -94%] 31.38u + 73.56s [ 2,831%] |
199.05±200.63/thread = 199±2e+02 [ -98%] 26.08u + 114.1s [ 3,814%] |
µsecs±error/2 = WALL(µs)±error [+-rate] USER (µs) + SYS (µs) [overhead] |
3.30± 0.18/loop = 6.59±0.37 [ +0%] 5.655u + 0.9082s [ 0%] |
6.81± 1.24/apply = 13.6±2.5 [ -52%] 10.99u + 9.674s [ 215%] |
14.98± 3.24/serial = 30±6.5 [ -78%] 18.54u + 19.26s [ 476%] |
18.86± 5.16/parallel = 37.7±10 [ -83%] 21.42u + 38.79s [ 817%] |
18.69± 3.98/queues = 37.4±8 [ -82%] 27.87u + 28.77s [ 763%] |
29.27± 4.82/openmp = 58.5±9.6 [ -89%] 35.22u + 74.46s [ 1,571%] |
236.94±222.33/thread = 474±4.4e+02 [ -99%] 49.51u + 329.9s [ 5,681%] |
µsecs±error/4 = WALL(µs)±error [+-rate] USER (µs) + SYS (µs) [overhead] |
3.06± 0.12/loop = 12.2±0.47 [ +0%] 11.28u + 0.9113s [ 0%] |
6.38± 1.38/apply = 25.5±5.5 [ -52%] 21.63u + 27.37s [ 302%] |
9.60± 2.14/serial = 38.4±8.6 [ -68%] 27.75u + 20.03s [ 292%] |
14.47± 2.62/parallel = 57.9±10 [ -79%] 42.32u + 58.05s [ 723%] |
13.34± 2.43/queues = 53.4±9.7 [ -77%] 50.32u + 54.64s [ 761%] |
14.48± 2.44/openmp = 57.9±9.7 [ -79%] 40.88u + 72.98s [ 834%] |
765.02±146.39/thread = 3.06e+03±5.9e+02 [ -100%] 133.5u + 3,056s [ 26,057%] |
µsecs±error/8 = WALL(µs)±error [+-rate] USER (µs) + SYS (µs) [overhead] |
2.94± 0.16/loop = 23.5±1.2 [ +0%] 22.54u + 0.9127s [ 0%] |
4.93± 0.90/apply = 39.5±7.2 [ -40%] 38.35u + 39.39s [ 232%] |
6.92± 1.23/serial = 55.4±9.8 [ -58%] 46.93u + 21.54s [ 192%] |
10.35± 1.59/parallel = 82.8±13 [ -72%] 75.51u + 80.27s [ 564%] |
10.17± 1.63/queues = 81.3±13 [ -71%] 103.9u + 84.11s [ 702%] |
7.60± 1.40/openmp = 60.8±11 [ -61%] 52.75u + 71.71s [ 431%] |
964.98±127.91/thread = 7.72e+03±1e+03 [ -100%] 244.8u + 7,726s [ 33,892%] |
µsecs±error/16 = WALL(µs)±error [+-rate] USER (µs) + SYS (µs) [overhead] |
2.88± 0.04/loop = 46±0.67 [ +0%] 45.04u + 0.9285s [ 0%] |
2.49± 0.51/apply = 39.8±8.1 [ +16%] 58.48u + 36.6s [ 107%] |
6.14± 1.21/serial = 98.3±19 [ -53%] 93.05u + 23.1s [ 153%] |
6.06± 0.95/parallel = 96.9±15 [ -53%] 131.6u + 86.59s [ 375%] |
6.04± 3.83/queues = 96.6±61 [ -52%] 172.5u + 85.43s [ 461%] |
4.12± 0.67/openmp = 66±11 [ -30%] 75.33u + 71.15s [ 219%] |
1024.88±131.54/thread = 1.64e+04±2.1e+03 [ -100%] 432.5u + 1.629e+04s [ 36,286%] |
µsecs±error/32 = WALL(µs)±error [+-rate] USER (µs) + SYS (µs) [overhead] |
2.86± 0.22/loop = 91.5±6.9 [ +0%] 90.26u + 1.194s [ 0%] |
1.80± 0.53/apply = 57.6±17 [ +59%] 113.2u + 38.64s [ 66%] |
4.49± 0.64/serial = 144±21 [ -36%] 152.9u + 19.62s [ 89%] |
3.90± 0.49/parallel = 125±16 [ -27%] 250.7u + 88.44s [ 271%] |
3.66± 1.51/queues = 117±48 [ -22%] 293.5u + 58.75s [ 285%] |
2.46± 0.37/openmp = 78.7±12 [ +16%] 121u + 72s [ 111%] |
1141.46±106.05/thread = 3.65e+04±3.4e+03 [ -100%] 860u + 3.564e+04s [ 39,814%] |
µsecs±error/64 = WALL(µs)±error [+-rate] USER (µs) + SYS (µs) [overhead] |
2.83± 0.02/loop = 181±1.1 [ +0%] 180.1u + 0.977s [ 0%] |
1.26± 0.18/apply = 80.8±12 [ +124%] 208.4u + 37.49s [ 36%] |
3.91± 0.28/serial = 250±18 [ -28%] 286.6u + 22.55s [ 71%] |
2.65± 0.32/parallel = 170±20 [ +7%] 465.5u + 84.52s [ 204%] |
3.38± 0.61/queues = 217±39 [ -16%] 619.9u + 81.5s [ 287%] |
1.62± 0.25/openmp = 104±16 [ +75%] 213.2u + 73.02s [ 58%] |
1169.36±103.51/thread = 7.48e+04±6.6e+03 [ -100%] 1,584u + 7.271e+04s [ 40,936%] |
µsecs±error/128 = WALL(µs)±error [+-rate] USER (µs) + SYS (µs) [overhead] |
2.82± 0.01/loop = 361±1.2 [ +0%] 360u + 0.9864s [ 0%] |
1.04± 0.12/apply = 133±16 [ +171%] 410.1u + 38.65s [ 24%] |
3.69± 0.21/serial = 473±26 [ -24%] 551.4u + 24.43s [ 60%] |
2.02± 0.22/parallel = 258±28 [ +40%] 822u + 90.94s [ 153%] |
3.35± 0.82/queues = 428±1.1e+02 [ -16%] 1,239u + 146.7s [ 284%] |
1.26± 0.22/openmp = 162±29 [ +123%] 417.3u + 77.4s [ 37%] |
1180.73±111.26/thread = 1.51e+05±1.4e+04 [ -100%] 3,014u + 1.471e+05s [ 41,467%] |
µsecs±error/256 = WALL(µs)±error [+-rate] USER (µs) + SYS (µs) [overhead] |
2.82± 0.05/loop = 723±12 [ +0%] 720.5u + 1.633s [ 0%] |
0.92± 0.10/apply = 235±24 [ +208%] 782.5u + 39.39s [ 14%] |
3.58± 0.19/serial = 916±50 [ -21%] 1,095u + 28.95s [ 56%] |
1.78± 0.18/parallel = 455±47 [ +59%] 1,585u + 103s [ 134%] |
3.28± 0.21/queues = 841±55 [ -14%] 2,455u + 270.1s [ 277%] |
1.10± 0.20/openmp = 282±51 [ +156%] 844.7u + 84.23s [ 29%] |
1183.97±125.56/thread = 3.03e+05±3.2e+04 [ -100%] 6,007u + 2.973e+05s [ 41,909%] |
µsecs±error/512 = WALL(µs)±error [+-rate] USER (µs) + SYS (µs) [overhead] |
2.82± 0.01/loop = 1.44e+03±6.2 [ +0%] 1,440u + 1.854s [ 0%] |
0.85± 0.08/apply = 435±42 [ +232%] 1,563u + 40.43s [ 11%] |
3.53± 0.16/serial = 1.81e+03±80 [ -20%] 2,182u + 37.31s [ 54%] |
1.65± 0.15/parallel = 845±77 [ +71%] 3,121u + 108.9s [ 124%] |
3.23± 0.19/queues = 1.66e+03±99 [ -13%] 4,773u + 557.7s [ 270%] |
0.94± 0.18/openmp = 479±91 [ +201%] 1,567u + 81.07s [ 14%] |
1184.66±129.03/thread = 6.07e+05±6.6e+04 [ -100%] 1.179e+04u + 5.956e+05s [ 42,016%] |
µsecs±error/1,024 = WALL(µs)±error [+-rate] USER (µs) + SYS (µs) [overhead] |
2.82± 0.01/loop = 2.88e+03±6.5 [ +0%] 2,880u + 2.214s [ 0%] |
0.82± 0.09/apply = 837±88 [ +245%] 3,089u + 40.06s [ 9%] |
3.51± 0.14/serial = 3.59e+03±1.4e+02 [ -20%] 4,361u + 59.2s [ 53%] |
1.59± 0.13/parallel = 1.63e+03±1.3e+02 [ +77%] 6,165u + 136.8s [ 119%] |
3.25± 0.74/queues = 3.33e+03±7.6e+02 [ -13%] 9,458u + 1,197s [ 270%] |
0.89± 0.14/openmp = 911±1.5e+02 [ +216%] 3,134u + 93.56s [ 12%] |
1211.52±112.92/thread = 1.24e+06±1.2e+05 [ -100%] 2.385e+04u + 1.207e+06s [ 42,593%] |
µsecs±error/2,048 = WALL(µs)±error [+-rate] USER (µs) + SYS (µs) [overhead] |
2.81± 0.01/loop = 5.77e+03±11 [ +0%] 5,761u + 3.552s [ 0%] |
0.80± 0.07/apply = 1.63e+03±1.4e+02 [ +254%] 6,050u + 43.06s [ 6%] |
3.44± 0.11/serial = 7.05e+03±2.2e+02 [ -18%] 8,587u + 98.01s [ 51%] |
1.58± 0.12/parallel = 3.23e+03±2.5e+02 [ +79%] 1.229e+04u + 201.4s [ 117%] |
3.20± 0.23/queues = 6.56e+03±4.6e+02 [ -12%] 1.869e+04u + 2,287s [ 264%] |
0.82± 0.14/openmp = 1.67e+03±2.9e+02 [ +244%] 6,030u + 91.75s [ 6%] |
1240.29±49.39/thread = 2.54e+06±1e+05 [ -100%] 4.873e+04u + 2.489e+06s [ 43,926%] |
µsecs±error/4,096 = WALL(µs)±error [+-rate] USER (µs) + SYS (µs) [overhead] |
2.81± 0.00/loop = 1.15e+04±14 [ +0%] 1.152e+04u + 4.097s [ 0%] |
0.78± 0.06/apply = 3.21e+03±2.6e+02 [ +259%] 1.216e+04u + 45.48s [ 6%] |
3.36± 0.10/serial = 1.38e+04±4.1e+02 [ -16%] 1.653e+04u + 167.1s [ 45%] |
1.57± 0.11/parallel = 6.41e+03±4.5e+02 [ +80%] 2.448e+04u + 341.7s [ 115%] |
3.28± 0.25/queues = 1.34e+04±1e+03 [ -14%] 3.785e+04u + 5,071s [ 272%] |
0.78± 0.14/openmp = 3.2e+03±5.7e+02 [ +260%] 1.184e+04u + 96.2s [ 4%] |
1232.05±107.67/thread = 5.05e+06±4.4e+05 [ -100%] 9.843e+04u + 4.875e+06s [ 43,054%] |
µsecs±error/8,192 = WALL(µs)±error [+-rate] USER (µs) + SYS (µs) [overhead] |
2.79± 0.00/loop = 2.29e+04±30 [ +0%] 2.288e+04u + 10.48s [ 0%] |
0.77± 0.06/apply = 6.34e+03±5e+02 [ +261%] 2.444e+04u + 55.8s [ 7%] |
3.29± 0.11/serial = 2.69e+04±9e+02 [ -15%] 3.23e+04u + 296s [ 42%] |
1.59± 0.13/parallel = 1.3e+04±1.1e+03 [ +76%] 4.841e+04u + 703.4s [ 115%] |
3.09± 0.29/queues = 2.53e+04±2.4e+03 [ -9%] 7.057e+04u + 6,884s [ 238%] |
0.77± 0.15/openmp = 6.29e+03±1.2e+03 [ +264%] 2.318e+04u + 106.6s [ 2%] |
1237.49±93.46/thread = 1.01e+07±7.7e+05 [ -100%] 1.768e+05u + 9.877e+06s [ 43,821%] |
µsecs±error/16,384 = WALL(µs)±error [+-rate] USER (µs) + SYS (µs) [overhead] |
2.50± 0.00/loop = 4.1e+04±53 [ +0%] 4.097e+04u + 22.67s [ 0%] |
0.70± 0.05/apply = 1.14e+04±8.7e+02 [ +259%] 4.43e+04u + 76.68s [ 8%] |
2.95± 0.12/serial = 4.84e+04±2e+03 [ -15%] 5.843e+04u + 1,495s [ 46%] |
1.49± 0.10/parallel = 2.45e+04±1.7e+03 [ +68%] 9.294e+04u + 1,490s [ 130%] |
3.20± 0.23/queues = 5.24e+04±3.7e+03 [ -22%] 1.443e+05u + 2.185e+04s [ 305%] |
0.74± 0.10/openmp = 1.22e+04±1.6e+03 [ +236%] 4.133e+04u + 111.1s [ 1%] |
µsecs±error/32,768 = WALL(µs)±error [+-rate] USER (µs) + SYS (µs) [overhead] |
2.35± 0.00/loop = 7.7e+04±79 [ +0%] 7.691e+04u + 42.3s [ 0%] |
0.66± 0.05/apply = 2.15e+04±1.6e+03 [ +258%] 8.354e+04u + 108.5s [ 9%] |
2.71± 0.09/serial = 8.88e+04±2.9e+03 [ -13%] 1.064e+05u + 3,037s [ 42%] |
1.47± 0.08/parallel = 4.82e+04±2.8e+03 [ +60%] 1.824e+05u + 4,690s [ 143%] |
3.16± 0.22/queues = 1.03e+05±7.2e+03 [ -26%] 2.824e+05u + 4.546e+04s [ 326%] |
0.73± 0.07/openmp = 2.39e+04±2.2e+03 [ +221%] 7.746e+04u + 137.8s [ 1%] |
µsecs±error/65,536 = WALL(µs)±error [+-rate] USER (µs) + SYS (µs) [overhead] |
1.75± 0.00/loop = 1.15e+05±86 [ +0%] 1.145e+05u + 75.38s [ 0%] |
0.51± 0.05/apply = 3.35e+04±3e+03 [ +242%] 1.269e+05u + 205.3s [ 11%] |
2.07± 0.07/serial = 1.35e+05±4.9e+03 [ -15%] 1.691e+05u + 6,997s [ 54%] |
1.37± 0.05/parallel = 9.01e+04±3.5e+03 [ +27%] 3.374e+05u + 1.177e+04s [ 205%] |
3.19± 0.19/queues = 2.09e+05±1.2e+04 [ -45%] 5.326e+05u + 1.111e+05s [ 462%] |
0.65± 0.02/openmp = 4.26e+04±1.3e+03 [ +169%] 1.16e+05u + 175.5s [ 1%] |
µsecs±error/131,072 = WALL(µs)±error [+-rate] USER (µs) + SYS (µs) [overhead] |
1.56± 0.00/loop = 2.04e+05±3.4e+02 [ +0%] 2.041e+05u + 150.1s [ 0%] |
0.45± 0.03/apply = 5.91e+04±4e+03 [ +246%] 2.291e+05u + 226.8s [ 12%] |
1.82± 0.05/serial = 2.39e+05±6.5e+03 [ -14%] 3.044e+05u + 1.394e+04s [ 56%] |
1.35± 0.04/parallel = 1.76e+05±5.5e+03 [ +16%] 6.554e+05u + 2.749e+04s [ 234%] |
3.33± 0.17/queues = 4.36e+05±2.2e+04 [ -53%] 1.064e+06u + 2.546e+05s [ 546%] |
0.62± 0.03/openmp = 8.11e+04±4.2e+03 [ +152%] 2.072e+05u + 242.9s [ 2%] |
µsecs±error/262,144 = WALL(µs)±error [+-rate] USER (µs) + SYS (µs) [overhead] |
1.44± 0.00/loop = 3.78e+05±2.6e+02 [ +0%] 3.776e+05u + 182.9s [ 0%] |
0.42± 0.03/apply = 1.1e+05±6.8e+03 [ +243%] 4.292e+05u + 380.2s [ 14%] |
1.68± 0.05/serial = 4.41e+05±1.2e+04 [ -14%] 5.69e+05u + 3.531e+04s [ 60%] |
1.32± 0.03/parallel = 3.46e+05±7.6e+03 [ +9%] 1.273e+06u + 6.527e+04s [ 254%] |
3.23± 0.05/queues = 8.47e+05±1.3e+04 [ -55%] 2.071e+06u + 5.011e+05s [ 581%] |
0.46± 0.03/openmp = 1.21e+05±6.9e+03 [ +212%] 3.823e+05u + 344.1s [ 1%] |
µsecs±error/524,288 = WALL(µs)±error [+-rate] USER (µs) + SYS (µs) [overhead] |
1.38± 0.00/loop = 7.23e+05±4.2e+02 [ +0%] 7.221e+05u + 585.3s [ 0%] |
0.40± 0.02/apply = 2.12e+05±1e+04 [ +241%] 8.258e+05u + 687.3s [ 14%] |
1.62± 0.03/serial = 8.47e+05±1.5e+04 [ -15%] 1.107e+06u + 8.113e+04s [ 64%] |
1.31± 0.03/parallel = 6.88e+05±1.3e+04 [ +5%] 2.498e+06u + 1.528e+05s [ 267%] |
3.26± 0.03/queues = 1.71e+06±1.6e+04 [ -58%] 4.125e+06u + 1.041e+06s [ 615%] |
0.41± 0.03/openmp = 2.17e+05±1.3e+04 [ +233%] 7.304e+05u + 594.9s [ 1%] |
Copyright © 2009 Apple Inc. All Rights Reserved. Terms of Use | Privacy Policy | Updated: 2009-09-08