楼主: Lisrelchen
2603 16

Efficient C++ Performance Programming Techniques [推广有奖]

11
Lisrelchen 发表于 2015-7-12 09:42:05
Chapter 10 Key Points
  • Inlining can improve performance. The goal is to find a program's fast path and inline it, though inlining this path may not be trivial.

  • Conditional inlining prevents inlining from occuring. This decreases compile-time and simplifies debug during the earlier phases of development.

  • Selective inlining is a technique that inlines methods only in some places. It can offset some of the code size explosion potential of inlining a method by inlining method calls only on performance-critical paths.

  • Recursive inlining is an ugly but effective technique for improving the performance of recursive methods.

  • Care needs to be taken with local static variables.

  • Inlining is aimed at call elimination. Be sure of the real cost of calls on your system before using inlining.



12
Lisrelchen 发表于 2015-7-12 09:44:24
Chapter 11 Key Points
  • The STL (Standard Template Library)is an uncommon combination of abstraction, flexibility, and efficiency.

  • Depending on your application, some containers are more efficient than others for a particular usage pattern.

  • Unless you know something about the problem domain that the STL doesn't, it is unlikely that you will beat the performance of an STL implementation by a wide enough margin to justify the effort.

  • It is possible, however, to exceed the performance of an STL implementation in some specific scenarios.



13
Lisrelchen 发表于 2015-7-12 09:48:54
Chapter 12 Key Points

Reference counting is not an automatic performance winner. Reference counting, execution speed, and resource conservation form a delicate interaction that must be evaluated carefully if performance is an important consideration. Reference counting may help or hurt performance depending on the usage pattern. The case in favor of reference counting is strengthened by any one of the following items:

  • The target object is a large resource consumer

  • The resource in question is expensive to allocate and free

  • A high degree of sharing; the reference count is likely to be high due to the use of the assignment operator and copy constructor

  • The creation or destruction of a reference is relatively cheap


If you reverse these items, you start leaning towards skipping reference counting in favor of the plain uncounted object.

14
Lisrelchen 发表于 2015-7-12 09:50:50
Chapter 13 Key Points

Coding optimizations are local in scope and do not necessitate understanding of overall program design. This is a good place to start when you join an ongoing project whose design you don't yet understand.

The fastest code is the one that's never executed. Try the following to bail out of a costly computation:

  • Are you ever going to use the result? It sounds silly, but it happens. At times we perform computation and never use the results.

  • Do you need the results now? Defer a computation to the point where it is actually needed. Premature computations may never be used on some execution flows.

  • Do you know the result already? We have seen costly computations performed even though their results were available two lines above. If you already computed it earlier in the execution flow, make the result available for reuse.


Sometimes you cannot bail out, and you just have to perform the computation. The challenge now is to speed it up:

  • Is the computation overly generic? You only need to be as flexible as the domain requires, not more. Take advantage of simplifying assumptions. Reduced flexibility increases speed.

  • Some flexibility is hidden in library calls. You may gain speed by rolling your own version of specific library calls that are called often enough to justify the effort. Familiarize yourself with the hidden cost of those library and system calls that you use.

  • Minimize memory management calls. They are relatively expensive on most compilers.

  • If you consider the set of all possible input data, 20% of it shows up 80% of the time. Speed up the processing of typical input at the expense of other scenarios.

  • The speed differential among cache, RAM, and disk access is significant. Write cache-friendly code.



15
Lisrelchen 发表于 2015-7-12 09:51:55
Chapter 14 Key Points
  • A fundamental tension exists between software performance and flexibility. On the 20% of your software that executes 80% of the time, performance often comes first at the expense of flexibility.

  • Caching opportunities may surface in the overall program design as well as in the minute coding details. You can often avoid big blobs of computation by simply stashing away the result of previous computations.

  • The use of efficient algorithms and data structures is a necessary but not sufficient condition for software efficiency.

  • Some computations may be necessary only on a subset of the overall likely execution scenarios. Those computations should be deferred to those execution paths that must have them. If a computation is performed prematurely, its result may go unused.

  • Large-scale software often tends towards chaos. One by-product of chaotic software is the execution of obsolete code: code that once upon a time served a purpose but no longer does. Periodic purges of obsolete code and other useless computations will boost performance as well as overall software hygiene.



16
Lisrelchen 发表于 2015-7-12 09:53:47
Chapter 15 Key Points
  • SMP is currently the dominant MP architecture. It consists of multiple symmetric processors connected via a single bus to a single memory system. The bus is the scalability weak link in the SMP architecture. Large caches, one per processor, are meant to keep bus contention under control.

  • Amdahl's Law puts an upper limit on the potential scalability of an application. The scalability is limited by portions of the computation that are serialized.


The trick to scalability is to reduce and, if possible, eliminate serialized code. Following are some steps you can take towards that goal:

  • Split a monolithic task into multiple subtasks that are conducive to parallel execution by concurrent threads.

  • Code motion. Critical sections should contain critical code and nothing else. Code that does not directly manipulate shared resources should not reside in the critical section.

  • Cache. At times, it may be possible to eliminate execution visits to a critical section by caching the result of an earlier visit.

  • Share nothing. If you only need a small, fixed number of resource instances, you should avoid the use of public resource pools. Make those instances thread-private and recycle them.

  • Partial-sharing. It is better to have two identical pools with half the contention.

  • Lock granularity. Don't fuse resources under the protection of the same lock unless they are always updated together.

  • False sharing. Don't place two hot locks in close proximity in the class definition. You don't want them to share the same cache line and trigger cache consistency storms.

  • Thundering Herd. Investigate the characteristics of your locking calls. When the lock is freed, does it wake up all waiting threads or just one? Those that wake up all threads threaten the scalability of an application.

  • System and library calls. Investigate the characteristics of their implementation. Some of them are hiding significant portions of serialized code.

  • Reader/writer locks. Shared data that is read-mostly will benefit from these locks. They eliminate contention among reader threads.

17
Lisrelchen 发表于 2015-7-12 09:54:49
Chapter 16 Key Points
  • The farther the memory you want to use is from the processor, the longer it takes to access. The resource closest to the processor, registers, are limited in their capability, but extremely fast. Their optimization can be very valuable.

  • Virtual memory is not free. Indiscriminate reliance on system maintained virtual structures can have very significant performance ramifications, typically negative ones.

  • Context switches are expensive; avoid them.

  • Lastly, though we are aware that internally managed asynchronous I/O has its place, we also feel that the coming shift in processor architecture will significantly disadvantage monolithic threading approaches.



您需要登录后才可以回帖 登录 | 我要注册

本版微信群
jg-xs1
拉您进交流群
GMT+8, 2026-1-4 00:07