Where The Work Is Really Done – Casual Profiling

Once a program has been debugged and works properly, it might be time to start optimizing it. A common way of doing this is a method called profiling – watching a program execute and counting the amount of computing time each step in the program takes. This is all well and good for most programs, but gets complicated when processes execute on more than one core. A profiler may count time spent waiting in a program for a process in another core to finish, giving meaningless results. To solve this problem, a method called casual profiling was developed.

In casual profiling, markers are placed in the code and the profiler can measure how fast the program gets to these markers. Since multiple cores are involved, and the profiler can’t speed up the rest of the program, it actually slows everything else down and measures the markers in order to simulate an increase in speed. [Daniel Morsig] took this idea and implemented it in Go, with an example used to demonstrate its effectiveness speeding up a single process by 95%, resulting in a 22% increase in the entire program. Using a regular profiler only counted a 3% increase, which was not as informative as the casual profiler’s 22% measurement.

We got this tip from [Greg Kennedy] who notes that he hasn’t seen much use of casual profiling outside of the academic world, but we agree that there is likely some usefulness to this method of keeping track of a multi-threaded program’s efficiency. If you know of any other ways of solving this problem, or have seen causal profiling in use in the wild, let us know in the comments below.

Header image: Alan Lorenzo [CC BY-SA 3.0].

13 thoughts on “Where The Work Is Really Done – Casual Profiling

    1. Basically the idea is this: you can have your regular profile tell you how long instructions take, and point out “slow” paths. But this isn’t always useful, because what is “slow” may be waiting on something even slower, or maybe a “fast” thread actually did its work and went to sleep when it COULD do more work, etc. CPU time and wallclock time are very different in a multithreaded environment.

      Instead the question you are trying to ask with causal profiling is “If I made this part run X% faster, how much faster would the whole program go?”. Which you can’t really do since you can’t just make code “go faster”… but instead, you can make all the rest of the code go SLOWER, in order to simulate the relative improvement.

      The profiling method simply automates all that for you and produces a report, which can direct the developer on where to tackle next.

  1. Isn’t Casual Profiling what’s done on tinder? Ahh, you meant Causal Profiling and nobody proof reads a HD articles before they are published…

    And I suspect who wrote the small article above also didn’t read (or maybe just didn’t understand) the article he linked to..

  2. Enter routine, pin hihg, do routine, pin low, exit routine, oscilloscope on pin. Very useful for looking at interrupt load on a microcontroller. If you would rather have a number, and don’t have an oscilloscope, add an rc filter and measure with a voltmeter.

  3. I do something similar for quick experiments.
    I slow down code by a few cycles, to see if it slows performance. If it does, then it’s a likely candidate for a perf increase if I spend tIme to optimize.

    However, theoretically this method won’t work when the threads are balanced equally and they all take the same amount of time. But practically, it has worked 95% of my time.

  4. start = time.now
    callslowcode()
    end = time.now
    delta = end – start

    been doing this for years. And something that just popped in my head. If you are on a micro controller use an external timer that is activated by a pin from the micro your debugging.

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.