Skip to content

perf: reduce Span._finish_ns overhead in tracer and profiling #13720

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

brettlangdon
Copy link
Member

  • Remove Hooks class usage from BaseContextProvider and Tracer
  • Replace hook-based callbacks with core event dispatcher system
  • Optimize _update_active logic to reduce redundant finished() checks
  • Cache span._local_root property lookup in profiling link_span
  • Streamline span activation flow in DefaultContextProvider
  • Convert SimpleMovingAverage to Cython for C-speed mathematical operations
  • Add encoder caching to reduce redundant _trace_id_64bits property lookups

Testing against a locally running version of PyPA warehouse with a single gunicorn worker with tracing and profiling enabled we saw an improvement of:

  • a 4.9% improvement in requests per second (20.08 → 21.06 RPS)
  • 4.6% reduction in response time (49.79ms → 47.48ms)
  • elimination of 44,028 function calls (-7.2%)

Checklist

  • PR author has checked that all the criteria below are met
  • The PR description includes an overview of the change
  • The PR description articulates the motivation for the change
  • The change includes tests OR the PR description describes a testing strategy
  • The PR description notes risks associated with the change, if any
  • Newly-added code is easy to change
  • The change follows the library release note guidelines
  • The change includes or references documentation updates if necessary
  • Backport labels are set (if applicable)

Reviewer Checklist

  • Reviewer has checked that all the criteria below are met
  • Title is accurate
  • All changes are related to the pull request's stated goal
  • Avoids breaking API changes
  • Testing strategy adequately addresses listed risks
  • Newly-added code is easy to change
  • Release note makes sense to a user of the library
  • If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment
  • Backport labels are set in a manner that is consistent with the release branch maintenance policy

  - Remove Hooks class usage from BaseContextProvider and Tracer
  - Replace hook-based callbacks with core event dispatcher system
  - Optimize _update_active logic to reduce redundant finished() checks
  - Cache span._local_root property lookup in profiling link_span
  - Streamline span activation flow in DefaultContextProvider
  - Convert SimpleMovingAverage to Cython for C-speed mathematical operations
  - Add encoder caching to reduce redundant _trace_id_64bits property lookups

Testing against a locally running version of PyPA warehouse with a single gunicorn worker with tracing and profiling enabled we saw an improvement of:

- a 4.9% improvement in requests per second (20.08 → 21.06 RPS)
- 4.6% reduction in response time (49.79ms → 47.48ms)
- elimination of 44,028 function calls (-7.2%)
Copy link
Contributor

github-actions bot commented Jun 19, 2025

CODEOWNERS have been resolved as:

ddtrace/internal/sma.pyi                                                @DataDog/apm-core-python
ddtrace/internal/sma.pyx                                                @DataDog/apm-core-python
src/native/event_hub.rs                                                 @DataDog/apm-core-python
.gitignore                                                              @DataDog/apm-core-python
ddtrace/_trace/provider.py                                              @DataDog/apm-sdk-api-python
ddtrace/_trace/tracer.py                                                @DataDog/apm-sdk-api-python
ddtrace/internal/_encoding.pyx                                          @DataDog/apm-core-python
ddtrace/internal/core/__init__.py                                       @DataDog/apm-core-python
ddtrace/internal/core/event_hub.py                                      @DataDog/apm-core-python
ddtrace/internal/datadog/profiling/stack_v2/__init__.py                 @DataDog/profiling-python
ddtrace/internal/native/_native.pyi                                     @DataDog/apm-core-python
ddtrace/internal/writer/writer.py                                       @DataDog/apm-core-python
ddtrace/profiling/collector/stack.pyx                                   @DataDog/profiling-python
pyproject.toml                                                          @DataDog/python-guild
setup.py                                                                @DataDog/python-guild
src/native/lib.rs                                                       @DataDog/apm-core-python
tests/suitespec.yml                                                     @DataDog/python-guild @DataDog/apm-core-python

Copy link
Contributor

github-actions bot commented Jun 19, 2025

Bootstrap import analysis

Comparison of import times between this PR and base.

Summary

The average import time from this PR is: 273 ± 3 ms.

The average import time from base is: 275 ± 2 ms.

The import time difference between this PR and base is: -2.5 ± 0.1 ms.

Import time breakdown

The following import paths have grown:

ddtrace.auto 2.501 ms (0.92%)
ddtrace.bootstrap.sitecustomize 1.891 ms (0.69%)
ddtrace.bootstrap.preload 1.177 ms (0.43%)
ddtrace.internal.products 0.947 ms (0.35%)
importlib.metadata 0.947 ms (0.35%)
importlib.metadata._meta 0.660 ms (0.24%)
importlib.abc 0.083 ms (0.03%)
importlib.resources 0.083 ms (0.03%)
importlib.resources._common 0.083 ms (0.03%)
zipfile 0.072 ms (0.03%)
ddtrace.settings.profiling 0.142 ms (0.05%)
ddtrace.vendor.psutil 0.073 ms (0.03%)
ddtrace.vendor.psutil._pslinux 0.073 ms (0.03%)
ddtrace.vendor.psutil._psposix 0.073 ms (0.03%)
ddtrace.internal.datadog.profiling.stack_v2 0.070 ms (0.03%)
ddtrace.internal.datadog.profiling.stack_v2._stack_v2 0.070 ms (0.03%)
ddtrace.internal.runtime.runtime_metrics 0.088 ms (0.03%)
ddtrace.internal.runtime.metric_collectors 0.088 ms (0.03%)
ddtrace.appsec._common_module_patches 0.663 ms (0.24%)
ddtrace.appsec._asm_request_context 0.663 ms (0.24%)
ddtrace._trace.trace_handlers 0.051 ms (0.02%)
ddtrace 0.610 ms (0.22%)
ddtrace.trace 0.552 ms (0.20%)
ddtrace._trace.filters 0.445 ms (0.16%)
ddtrace._trace.processor 0.445 ms (0.16%)
ddtrace.internal.writer 0.138 ms (0.05%)
ddtrace.internal.writer.writer 0.138 ms (0.05%)
gzip 0.081 ms (0.03%)
ddtrace.internal.sma 0.057 ms (0.02%)
ddtrace.internal.dogstatsd 0.104 ms (0.04%)
ddtrace.vendor.dogstatsd 0.104 ms (0.04%)
ddtrace.vendor.dogstatsd.base 0.104 ms (0.04%)
queue 0.104 ms (0.04%)
ddtrace._trace.sampler 0.088 ms (0.03%)
ddtrace._trace.span 0.088 ms (0.03%)
ddtrace.internal.sampling 0.088 ms (0.03%)
ddtrace._trace.sampling_rule 0.088 ms (0.03%)
ddtrace._trace.tracer 0.106 ms (0.04%)
ddtrace.internal.debug 0.106 ms (0.04%)
ddtrace._monkey 0.058 ms (0.02%)
ddtrace.settings.asm 0.058 ms (0.02%)
ddtrace.appsec._constants 0.058 ms (0.02%)

The following import paths have shrunk:

ddtrace.auto 5.290 ms (1.94%)
ddtrace.bootstrap.sitecustomize 2.993 ms (1.10%)
ddtrace.bootstrap.preload 2.330 ms (0.85%)
ddtrace.internal.products 0.879 ms (0.32%)
importlib.metadata 0.879 ms (0.32%)
zipfile 0.632 ms (0.23%)
zipfile._path 0.632 ms (0.23%)
csv 0.104 ms (0.04%)
_csv 0.104 ms (0.04%)
importlib.abc 0.073 ms (0.03%)
importlib.resources 0.073 ms (0.03%)
importlib.resources._legacy 0.073 ms (0.03%)
importlib.metadata._itertools 0.070 ms (0.03%)
ddtrace.internal.remoteconfig.client 0.619 ms (0.23%)
ddtrace.internal.runtime.runtime_metrics 0.089 ms (0.03%)
ddtrace.internal.runtime.tag_collectors 0.089 ms (0.03%)
ddtrace.settings.profiling 0.054 ms (0.02%)
ddtrace.internal.datadog.profiling.ddup 0.054 ms (0.02%)
ddtrace.internal.datadog.profiling.ddup._ddup 0.054 ms (0.02%)
multiprocessing.sharedctypes 0.043 ms (0.02%)
multiprocessing.heap 0.043 ms (0.02%)
mmap 0.043 ms (0.02%)
ddtrace.appsec._common_module_patches 0.663 ms (0.24%)
ddtrace.appsec._asm_request_context 0.663 ms (0.24%)
ddtrace.appsec._utils 0.663 ms (0.24%)
ddtrace 2.298 ms (0.84%)
ddtrace._monkey 1.079 ms (0.40%)
ddtrace.appsec._listeners 0.965 ms (0.35%)
ddtrace.internal.core 0.965 ms (0.35%)
ddtrace.internal.core.event_hub 0.880 ms (0.32%)
ddtrace.vendor.packaging.specifiers 0.114 ms (0.04%)
ddtrace.trace 0.518 ms (0.19%)
ddtrace._trace.filters 0.382 ms (0.14%)
ddtrace._trace.processor 0.382 ms (0.14%)
ddtrace._trace.sampler 0.204 ms (0.07%)
ddtrace._trace.span 0.204 ms (0.07%)
pprint 0.121 ms (0.04%)
ddtrace.internal.dogstatsd 0.100 ms (0.04%)
ddtrace.vendor.dogstatsd 0.100 ms (0.04%)
ddtrace.vendor.dogstatsd.base 0.100 ms (0.04%)
ddtrace.vendor.dogstatsd.container 0.100 ms (0.04%)
ddtrace.internal.writer 0.078 ms (0.03%)
ddtrace.internal.writer.writer 0.078 ms (0.03%)
ddtrace._trace.tracer 0.099 ms (0.04%)
ddtrace.settings.peer_service 0.099 ms (0.04%)
ddtrace._trace.provider 0.037 ms (0.01%)
ddtrace.internal._unpatched 0.056 ms (0.02%)
json 0.030 ms (0.01%)
json.decoder 0.030 ms (0.01%)
re 0.030 ms (0.01%)
enum 0.030 ms (0.01%)
types 0.030 ms (0.01%)
subprocess 0.026 ms (0.01%)
contextlib 0.026 ms (0.01%)

@pr-commenter
Copy link

pr-commenter bot commented Jun 19, 2025

Benchmarks

Benchmark execution time: 2025-06-20 20:32:31

Comparing candidate commit 50dc57c in PR branch brettlangdon/perf.optimize.span.finish with baseline commit 7a55674 in branch main.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 561 metrics, 3 unstable metrics.

brettlangdon and others added 12 commits June 20, 2025 08:05
We make the native extension cache more fine-grained by generating
hashes for each single native integration that we build. We then store
each generated artifact in the cache according to this FS structure

```
.ext_cache/           # Example:
└─ <module_name>/     #   "ddtrace.profiling.collector"
   └─ <source_hash>/  #   <sha256 hex digest>
      └─ <binary>     #   "collector.cpython-310-darwin.so"
```

We implement a custom command to allow retrieving the source to target
mapping from `setup.py` directly, instead of coding the same information
elsewhere, which would increase the maintenance cost of the whole build
process.

## Checklist
- [x] PR author has checked that all the criteria below are met
- The PR description includes an overview of the change
- The PR description articulates the motivation for the change
- The change includes tests OR the PR description describes a testing
strategy
- The PR description notes risks associated with the change, if any
- Newly-added code is easy to change
- The change follows the [library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html)
- The change includes or references documentation updates if necessary
- Backport labels are set (if
[applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting))

## Reviewer Checklist
- [ ] Reviewer has checked that all the criteria below are met 
- Title is accurate
- All changes are related to the pull request's stated goal
- Avoids breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes
- Testing strategy adequately addresses listed risks
- Newly-added code is easy to change
- Release note makes sense to a user of the library
- If necessary, author has acknowledged and discussed the performance
implications of this PR as reported in the benchmarks PR comment
- Backport labels are set in a manner that is consistent with the
[release branch maintenance
policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)
fixes error when we try to concat the string 'truncated' to bytes

## Checklist
- [x] PR author has checked that all the criteria below are met
- The PR description includes an overview of the change
- The PR description articulates the motivation for the change
- The change includes tests OR the PR description describes a testing
strategy
- The PR description notes risks associated with the change, if any
- Newly-added code is easy to change
- The change follows the [library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html)
- The change includes or references documentation updates if necessary
- Backport labels are set (if
[applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting))

## Reviewer Checklist
- [x] Reviewer has checked that all the criteria below are met 
- Title is accurate
- All changes are related to the pull request's stated goal
- Avoids breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes
- Testing strategy adequately addresses listed risks
- Newly-added code is easy to change
- Release note makes sense to a user of the library
- If necessary, author has acknowledged and discussed the performance
implications of this PR as reported in the benchmarks PR comment
- Backport labels are set in a manner that is consistent with the
[release branch maintenance
policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)

---------

Co-authored-by: Quinna Halim <quinna.halim@datadoghq.com>
Co-authored-by: Brett Langdon <brett.langdon@datadoghq.com>
We do a minor setup.py cleanup, focusing on typing correctness.

## Checklist
- [x] PR author has checked that all the criteria below are met
- The PR description includes an overview of the change
- The PR description articulates the motivation for the change
- The change includes tests OR the PR description describes a testing
strategy
- The PR description notes risks associated with the change, if any
- Newly-added code is easy to change
- The change follows the [library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html)
- The change includes or references documentation updates if necessary
- Backport labels are set (if
[applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting))

## Reviewer Checklist
- [x] Reviewer has checked that all the criteria below are met 
- Title is accurate
- All changes are related to the pull request's stated goal
- Avoids breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes
- Testing strategy adequately addresses listed risks
- Newly-added code is easy to change
- Release note makes sense to a user of the library
- If necessary, author has acknowledged and discussed the performance
implications of this PR as reported in the benchmarks PR comment
- Backport labels are set in a manner that is consistent with the
[release branch maintenance
policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)
Add telemetry support for the legacy ATO SDK.

This will also be tested using system tests
DataDog/system-tests#4806

APPSEC-58051

## Checklist
- [x] PR author has checked that all the criteria below are met
- The PR description includes an overview of the change
- The PR description articulates the motivation for the change
- The change includes tests OR the PR description describes a testing
strategy
- The PR description notes risks associated with the change, if any
- Newly-added code is easy to change
- The change follows the [library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html)
- The change includes or references documentation updates if necessary
- Backport labels are set (if
[applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting))

## Reviewer Checklist
- [x] Reviewer has checked that all the criteria below are met 
- Title is accurate
- All changes are related to the pull request's stated goal
- Avoids breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes
- Testing strategy adequately addresses listed risks
- Newly-added code is easy to change
- Release note makes sense to a user of the library
- If necessary, author has acknowledged and discussed the performance
implications of this PR as reported in the benchmarks PR comment
- Backport labels are set in a manner that is consistent with the
[release branch maintenance
policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)
This PR is a workaround for this [bug
report](#13700) where
LiteLLM users are running into an unhandled exception caused by trying
to append a chunk choice to the list of streamed chunks at that choice
index.

The root of the issue comes down to where the user is setting the
parameter for number of choices in the streamed response. If the `n`
parameter was set as a kwarg, then there would be no issue. However, if
the parameter was set elsewhere (e.g. in the user's proxy config file),
then the `_streamed_chunks` list would incorrectly expect 1 choice which
would lead to an indexing error.

I was able to reproduce the issue with the following config and client
request:

Config File
```
model_list:
  - model_name: gpt-3.5-turbo
    litellm_params:
      model: openai/gpt-3.5-turbo
      api_key: "os.environ/OPENAI_API_KEY"
      n: 2
      temperature: 0.2
```

Client Request
```
import os
import litellm
import asyncio
from litellm import acompletion

litellm.api_key = os.environ["OPENAI_API_KEY"]
async def acompletion_proxy():
    messages = [{ "content": "What color is the sky?","role": "user"}]
    response = await acompletion(model="gpt-3.5-turbo", messages=messages, api_base="http://0.0.0.0:4000/", stream=True)
    async for item in response:
        print(item)

if __name__ == "__main__":
    asyncio.run(acompletion_proxy())
```

This lead to the following error:
```
Traceback (most recent call last):
  File "/Users/nicole.cybul/Documents/ML Observability/scripts/integrations/_simple_litellm_script.py", line 14, in <module>
    for item in response:
  File "/Users/nicole.cybul/Documents/ML Observability/scripts/.venv/lib/python3.11/site-packages/ddtrace/contrib/internal/litellm/utils.py", line 63, in __iter__
    _loop_handler(chunk, self._streamed_chunks)
  File "/Users/nicole.cybul/Documents/ML Observability/scripts/.venv/lib/python3.11/site-packages/ddtrace/contrib/internal/litellm/utils.py", line 131, in _loop_handler
    streamed_chunks[choice.index].append(choice)
    ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^
IndexError: list index out of range
```

The fix in this PR uses a `defaultdict(list)` which should be robust
against these types of indexing issues.

With this new fix, the same request leads to a successful response and
this
[trace](https://dd.datad0g.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZeEzsIo5PnrZAAAABhBWmVFenNJb0FBRDZPcUs4ZV94bEFBQUEAAAAkZjE5Nzg0Y2UtZTFlNy00YzI2LTk5MWQtMjg3YmJlNGM2ZTllAAAAIg%22%7D%5D&spanId=15697199987942968683&start=1750278947467&end=1750279847467&paused=false)
in the product.

## Checklist
- [x] PR author has checked that all the criteria below are met
- The PR description includes an overview of the change
- The PR description articulates the motivation for the change
- The change includes tests OR the PR description describes a testing
strategy
- The PR description notes risks associated with the change, if any
- Newly-added code is easy to change
- The change follows the [library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html)
- The change includes or references documentation updates if necessary
- Backport labels are set (if
[applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting))

## Reviewer Checklist
- [x] Reviewer has checked that all the criteria below are met 
- Title is accurate
- All changes are related to the pull request's stated goal
- Avoids breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes
- Testing strategy adequately addresses listed risks
- Newly-added code is easy to change
- Release note makes sense to a user of the library
- If necessary, author has acknowledged and discussed the performance
implications of this PR as reported in the benchmarks PR comment
- Backport labels are set in a manner that is consistent with the
[release branch maintenance
policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)
We include the rust extensions in the computation of the hash for native
extensions. This allows us to cache the compilation result of rust
builds and reuse them in CI.

## Checklist
- [x] PR author has checked that all the criteria below are met
- The PR description includes an overview of the change
- The PR description articulates the motivation for the change
- The change includes tests OR the PR description describes a testing
strategy
- The PR description notes risks associated with the change, if any
- Newly-added code is easy to change
- The change follows the [library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html)
- The change includes or references documentation updates if necessary
- Backport labels are set (if
[applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting))

## Reviewer Checklist
- [ ] Reviewer has checked that all the criteria below are met 
- Title is accurate
- All changes are related to the pull request's stated goal
- Avoids breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes
- Testing strategy adequately addresses listed risks
- Newly-added code is easy to change
- Release note makes sense to a user of the library
- If necessary, author has acknowledged and discussed the performance
implications of this PR as reported in the benchmarks PR comment
- Backport labels are set in a manner that is consistent with the
[release branch maintenance
policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)
…setup.py (#13732)

With the recent changes that have introduced some incremental build
support for CMake, there doesn't seem to be a reason to maintain this
extra feature, so we can revert the change and simplify setup.py a bit.

- [x] PR author has checked that all the criteria below are met
- The PR description includes an overview of the change
- The PR description articulates the motivation for the change
- The change includes tests OR the PR description describes a testing
strategy
- The PR description notes risks associated with the change, if any
- Newly-added code is easy to change
- The change follows the [library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html)
- The change includes or references documentation updates if necessary
- Backport labels are set (if
[applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting))

- [x] Reviewer has checked that all the criteria below are met
- Title is accurate
- All changes are related to the pull request's stated goal
- Avoids breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes
- Testing strategy adequately addresses listed risks
- Newly-added code is easy to change
- Release note makes sense to a user of the library
- If necessary, author has acknowledged and discussed the performance
implications of this PR as reported in the benchmarks PR comment
- Backport labels are set in a manner that is consistent with the
[release branch maintenance
policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)

Co-authored-by: Brett Langdon <brett.langdon@datadoghq.com>
@brettlangdon brettlangdon force-pushed the brettlangdon/perf.optimize.span.finish branch from ee0f0a2 to 19270da Compare June 20, 2025 19:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants