Skip to content

Commit 6858602

Browse files
timfishAbhiPrasad
andauthored
feat(cloudflare): Add instrumentWorkflowWithSentry to instrument workflows (#16672)
- Closes #16458 It was tricky to instrument Cloudflare workflows! The code for a workflow might look simple but there is a lot of clever shenanigans going on under the hood in the runtime to allow suspension of workflows. Workflows can be hibernated at any time and all state/context inside the your workflow class and elsewhere is lost. Ideally we want all of our step runs to have the same `trace_id` so all steps in a workflow run are linked together and all steps should have the same sampling decision. To work around the state limitations, we use the workflow `instanceId` as both the Sentry `trace_id` and the last 4 characters are used to generate the `sample_rand` used in the sampling decision. Cloudflare uses uuid's by default for `instanceId` but users do have the option of passing their own IDs. If users are supplying their own `instanceId`'s, they need to be both random and a 32 character uuid (with or without hyphens) or the Sentry instrumentation will throw an error. Points worthy of note: - We use a `enableDedupe` config option (docs hidden) which removes the `dedupeIntegration` for workflows. We want to get duplicate errors for step retries - We have to wrap the Cloudflare `WorkflowStep` object in another class. The Cloudflare step object is native so it's properties can't be overridden or proxied - Our wrapping does end up in all the stack traces but should be automatically hidden because they will be evaluated as `in_app: false` - We don't wrap `step.sleep`, `step.sleepUntil` or `step.waitForEvent` because code doesn't run after the Cloudflare native function returns ☹️ - Calling `setPropagationContext` directly on the isolation context didn't work. It needed another `withScope` inside for `setPropagationContext` to work. @mydea is that expected? - This PR doesn't yet capture: - The payload supplied when the workflow run was started - The return results from the workflow steps Here is an example trace showing the final step failing (throwing) 6 times before completing successfully. The exponential retry backoff is clearly visible. <img width="1233" alt="image" src="https://github.com/user-attachments/assets/1c6356b4-2416-439c-a842-ef942fce68b4" /> --------- Co-authored-by: Abhijeet Prasad <aprasad@sentry.io>
1 parent 0bee39d commit 6858602

File tree

9 files changed

+556
-10
lines changed

9 files changed

+556
-10
lines changed

packages/cloudflare/package.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@
6060
}
6161
},
6262
"devDependencies": {
63-
"@cloudflare/workers-types": "4.20240725.0",
63+
"@cloudflare/workers-types": "4.20250620.0",
6464
"@types/node": "^18.19.1",
6565
"wrangler": "^3.67.1"
6666
},

packages/cloudflare/src/client.ts

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -29,8 +29,14 @@ export class CloudflareClient extends ServerRuntimeClient<CloudflareClientOption
2929
}
3030
}
3131

32-
// eslint-disable-next-line @typescript-eslint/no-empty-interface
33-
interface BaseCloudflareOptions {}
32+
interface BaseCloudflareOptions {
33+
/**
34+
* @ignore Used internally to disable the deDupeIntegration for workflows.
35+
* @hidden Used internally to disable the deDupeIntegration for workflows.
36+
* @default true
37+
*/
38+
enableDedupe?: boolean;
39+
}
3440

3541
/**
3642
* Configuration options for the Sentry Cloudflare SDK

packages/cloudflare/src/index.ts

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -110,4 +110,6 @@ export { fetchIntegration } from './integrations/fetch';
110110

111111
export { instrumentD1WithSentry } from './d1';
112112

113+
export { instrumentWorkflowWithSentry } from './workflows';
114+
113115
export { setAsyncLocalStorageAsyncContextStrategy } from './async';

packages/cloudflare/src/pages-plugin.ts

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,8 @@ export function sentryPagesPlugin<
4949
setAsyncLocalStorageAsyncContextStrategy();
5050
return context => {
5151
const options = typeof handlerOrOptions === 'function' ? handlerOrOptions(context) : handlerOrOptions;
52-
return wrapRequestHandler({ options, request: context.request, context }, () => context.next());
52+
return wrapRequestHandler({ options, request: context.request, context: { ...context, props: {} } }, () =>
53+
context.next(),
54+
);
5355
};
5456
}

packages/cloudflare/src/sdk.ts

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,9 @@ import { defaultStackParser } from './vendor/stacktrace';
2020
export function getDefaultIntegrations(options: CloudflareOptions): Integration[] {
2121
const sendDefaultPii = options.sendDefaultPii ?? false;
2222
return [
23-
dedupeIntegration(),
23+
// The Dedupe integration should not be used in workflows because we want to
24+
// capture all step failures, even if they are the same error.
25+
...(options.enableDedupe === false ? [] : [dedupeIntegration()]),
2426
// TODO(v10): Replace with `eventFiltersIntegration` once we remove the deprecated `inboundFiltersIntegration`
2527
// eslint-disable-next-line deprecation/deprecation
2628
inboundFiltersIntegration(),

packages/cloudflare/src/workflows.ts

Lines changed: 184 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,184 @@
1+
import type { PropagationContext } from '@sentry/core';
2+
import {
3+
captureException,
4+
flush,
5+
SEMANTIC_ATTRIBUTE_SENTRY_ORIGIN,
6+
SEMANTIC_ATTRIBUTE_SENTRY_SOURCE,
7+
startSpan,
8+
withIsolationScope,
9+
withScope,
10+
} from '@sentry/core';
11+
import type {
12+
WorkflowEntrypoint,
13+
WorkflowEvent,
14+
WorkflowSleepDuration,
15+
WorkflowStep,
16+
WorkflowStepConfig,
17+
WorkflowStepEvent,
18+
WorkflowTimeoutDuration,
19+
} from 'cloudflare:workers';
20+
import { setAsyncLocalStorageAsyncContextStrategy } from './async';
21+
import type { CloudflareOptions } from './client';
22+
import { addCloudResourceContext } from './scope-utils';
23+
import { init } from './sdk';
24+
25+
const UUID_REGEX = /^[0-9a-f]{8}-?[0-9a-f]{4}-?[0-9a-f]{4}-?[0-9a-f]{4}-?[0-9a-f]{12}$/i;
26+
27+
function propagationContextFromInstanceId(instanceId: string): PropagationContext {
28+
// Validate and normalize traceId - should be a valid UUID with or without hyphens
29+
if (!UUID_REGEX.test(instanceId)) {
30+
throw new Error("Invalid 'instanceId' for workflow: Sentry requires random UUIDs for instanceId.");
31+
}
32+
33+
// Remove hyphens to get UUID without hyphens
34+
const traceId = instanceId.replace(/-/g, '');
35+
36+
// Derive sampleRand from last 4 characters of the random UUID
37+
//
38+
// We cannot store any state between workflow steps, so we derive the
39+
// sampleRand from the traceId itself. This ensures that the sampling is
40+
// consistent across all steps in the same workflow instance.
41+
const sampleRand = parseInt(traceId.slice(-4), 16) / 0xffff;
42+
43+
return {
44+
traceId,
45+
sampleRand,
46+
};
47+
}
48+
49+
async function workflowStepWithSentry<V>(
50+
instanceId: string,
51+
options: CloudflareOptions,
52+
callback: () => V,
53+
): Promise<V> {
54+
setAsyncLocalStorageAsyncContextStrategy();
55+
56+
return withIsolationScope(async isolationScope => {
57+
const client = init({ ...options, enableDedupe: false });
58+
isolationScope.setClient(client);
59+
60+
addCloudResourceContext(isolationScope);
61+
62+
return withScope(async scope => {
63+
const propagationContext = propagationContextFromInstanceId(instanceId);
64+
scope.setPropagationContext(propagationContext);
65+
66+
// eslint-disable-next-line no-return-await
67+
return await callback();
68+
});
69+
});
70+
}
71+
72+
class WrappedWorkflowStep implements WorkflowStep {
73+
public constructor(
74+
private _instanceId: string,
75+
private _ctx: ExecutionContext,
76+
private _options: CloudflareOptions,
77+
private _step: WorkflowStep,
78+
) {}
79+
80+
public async do<T extends Rpc.Serializable<T>>(name: string, callback: () => Promise<T>): Promise<T>;
81+
public async do<T extends Rpc.Serializable<T>>(
82+
name: string,
83+
config: WorkflowStepConfig,
84+
callback: () => Promise<T>,
85+
): Promise<T>;
86+
public async do<T extends Rpc.Serializable<T>>(
87+
name: string,
88+
configOrCallback: WorkflowStepConfig | (() => Promise<T>),
89+
maybeCallback?: () => Promise<T>,
90+
): Promise<T> {
91+
const userCallback = (maybeCallback || configOrCallback) as () => Promise<T>;
92+
const config = typeof configOrCallback === 'function' ? undefined : configOrCallback;
93+
94+
const instrumentedCallback: () => Promise<T> = async () => {
95+
return workflowStepWithSentry(this._instanceId, this._options, async () => {
96+
return startSpan(
97+
{
98+
op: 'function.step.do',
99+
name,
100+
attributes: {
101+
'cloudflare.workflow.timeout': config?.timeout,
102+
'cloudflare.workflow.retries.backoff': config?.retries?.backoff,
103+
'cloudflare.workflow.retries.delay': config?.retries?.delay,
104+
'cloudflare.workflow.retries.limit': config?.retries?.limit,
105+
[SEMANTIC_ATTRIBUTE_SENTRY_ORIGIN]: 'auto.faas.cloudflare.workflow',
106+
[SEMANTIC_ATTRIBUTE_SENTRY_SOURCE]: 'task',
107+
},
108+
},
109+
async span => {
110+
try {
111+
const result = await userCallback();
112+
span.setStatus({ code: 1 });
113+
return result;
114+
} catch (error) {
115+
captureException(error, { mechanism: { handled: true, type: 'cloudflare' } });
116+
throw error;
117+
} finally {
118+
this._ctx.waitUntil(flush(2000));
119+
}
120+
},
121+
);
122+
});
123+
};
124+
125+
return config ? this._step.do(name, config, instrumentedCallback) : this._step.do(name, instrumentedCallback);
126+
}
127+
128+
public async sleep(name: string, duration: WorkflowSleepDuration): Promise<void> {
129+
return this._step.sleep(name, duration);
130+
}
131+
132+
public async sleepUntil(name: string, timestamp: Date | number): Promise<void> {
133+
return this._step.sleepUntil(name, timestamp);
134+
}
135+
136+
public async waitForEvent<T extends Rpc.Serializable<T>>(
137+
name: string,
138+
options: { type: string; timeout?: WorkflowTimeoutDuration | number },
139+
): Promise<WorkflowStepEvent<T>> {
140+
return this._step.waitForEvent<T>(name, options);
141+
}
142+
}
143+
144+
/**
145+
* Instruments a Cloudflare Workflow class with Sentry.
146+
*
147+
* @example
148+
* ```typescript
149+
* const InstrumentedWorkflow = instrumentWorkflowWithSentry(
150+
* (env) => ({ dsn: env.SENTRY_DSN }),
151+
* MyWorkflowClass
152+
* );
153+
*
154+
* export default InstrumentedWorkflow;
155+
* ```
156+
*
157+
* @param optionsCallback - Function that returns Sentry options to initialize Sentry
158+
* @param WorkflowClass - The workflow class to instrument
159+
* @returns Instrumented workflow class with the same interface
160+
*/
161+
export function instrumentWorkflowWithSentry<
162+
E, // Environment type
163+
P, // Payload type
164+
T extends WorkflowEntrypoint<E, P>, // WorkflowEntrypoint type
165+
C extends new (ctx: ExecutionContext, env: E) => T, // Constructor type of the WorkflowEntrypoint class
166+
>(optionsCallback: (env: E) => CloudflareOptions, WorkFlowClass: C): C {
167+
return new Proxy(WorkFlowClass, {
168+
construct(target: C, args: [ctx: ExecutionContext, env: E], newTarget) {
169+
const [ctx, env] = args;
170+
const options = optionsCallback(env);
171+
const instance = Reflect.construct(target, args, newTarget) as T;
172+
return new Proxy(instance, {
173+
get(obj, prop, receiver) {
174+
if (prop === 'run') {
175+
return async function (event: WorkflowEvent<P>, step: WorkflowStep): Promise<unknown> {
176+
return obj.run.call(obj, event, new WrappedWorkflowStep(event.instanceId, ctx, options, step));
177+
};
178+
}
179+
return Reflect.get(obj, prop, receiver);
180+
},
181+
});
182+
},
183+
}) as C;
184+
}

0 commit comments

Comments
 (0)