Skip to content

Fix stdio_client kill process after timeout #555

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

cristipufu
Copy link

@cristipufu cristipufu commented Apr 21, 2025

Motivation and Context

Be consistent across platforms (Windows, Linux, etc). Some MCP Servers don't react to SIGTERM, so stdio_client will hang indefinitely when exiting the context manager.

Reference: #372

How Has This Been Tested?

Yes

Breaking Changes

No

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update

Checklist

  • I have read the MCP Documentation
  • My code follows the repository's style guidelines
  • New and existing tests pass locally
  • I have added appropriate error handling
  • I have added or updated documentation as needed

Copy link
Contributor

@ihrpr ihrpr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @cristipufu for working on this!

Do you have steps to reproduce? How this changes were tested?

@ihrpr ihrpr modified the milestones: r-05-25, stdio shutdown May 28, 2025
await process.wait()
except TimeoutError:
# Force kill if it doesn't terminate
process.kill()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the same TerminateProcess is called on Windows... Did this worked locally?

https://anyio.readthedocs.io/en/stable/api.html#anyio.abc.Process.kill

@surya-prakash-susarla
Copy link

Slightly orthogonal but this does not handle child process spawns which can cause the parent to leave ghost processes which are being re-parented to the root process. Addressed it here: #850

felixweinberger added a commit that referenced this pull request Jun 26, 2025
Add comprehensive tests that validate stdio client cleanup behavior:
- Universal timeout test ensures cleanup completes within reasonable time even with long-running processes
- Immediate completion test ensures fast processes remain fast and timeout mechanism doesn't introduce delays
- Cross-platform compatibility with different commands for Windows vs Unix

These tests will help ensure PR #555's timeout mechanism works correctly without regressing normal operation.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
felixweinberger added a commit that referenced this pull request Jun 26, 2025
Changes from PR #555 by @cristipufu:
- Remove Windows-specific terminate_windows_process function
- Apply uniform 2-second timeout mechanism across all platforms:
  - Try process.terminate() first (SIGTERM on Unix, similar on Windows)
  - Wait up to 2 seconds for graceful termination
  - If timeout occurs, force process.kill() (SIGKILL equivalent)

This ensures consistent behavior between Windows and Unix systems when
cleaning up stdio client processes, preventing hanging when MCP servers
don't respond to SIGTERM.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
felixweinberger added a commit that referenced this pull request Jun 26, 2025
This change demonstrates a simpler approach than PR #555's timeout mechanism:

1. **Evidence**: Extensive testing shows PR #559 stream cleanup alone prevents
   hanging, even with servers that ignore SIGTERM and keep streams open.

2. **Simplification**: Removes all timeout logic and Windows-specific termination
   function in favor of unified `process.terminate()` + stream cleanup.

3. **Benefits**:
   - Less code complexity (no timeout handling, no platform branching)
   - Preserves proven stream cleanup protection from PR #559
   - Makes behavior consistent across all platforms
   - All existing timeout tests still pass

4. **Risk reduction**: Avoids changing process termination semantics while
   maintaining hanging protection through stream cleanup.

The core insight: process hanging was caused by stream management issues
(solved by PR #559), not termination timing issues (targeted by PR #555).
felixweinberger added a commit that referenced this pull request Jun 26, 2025
Copy of #555

Testing shows PR #559 stream cleanup alone prevents hanging,
even with servers that ignore SIGTERM and keep streams open.

Removes all timeout logic and Windows-specific termination
function in favor of unified `process.terminate()`.
felixweinberger added a commit that referenced this pull request Jun 26, 2025
NOTE: this test FAIL without #555

This test verifies that stdio_client properly handles processes that
ignore SIGTERM but respond to SIGINT, preventing hanging issues with
Node.js servers and other interactive tools that expect Ctrl+C signals.
felixweinberger added a commit that referenced this pull request Jun 26, 2025
**NOTE: These tests FAIL without the changes introduced in #559**

- test_stdio_client_universal_timeout
- test_stdio_client_immediate_completion

await read_stream.aclose()
await write_stream.aclose()
await read_stream_writer.aclose()
await write_stream_reader.aclose()

These tests verify that stdio_client completes cleanup within reasonable
time for both slow-terminating and fast-exiting processes, preventing
the hanging issues reported in #559.

**NOTE: This test FAILS without the changes introduced in #555**

- test_stdio_client_sigint_only_process

try:
    process.terminate()
    with anyio.fail_after(2.0):
        await process.wait()
except TimeoutError:
    # If process doesn't terminate in time, force kill it
    process.kill()
felixweinberger added a commit that referenced this pull request Jun 26, 2025
**NOTE: These tests FAIL without the changes introduced in #559**

- test_stdio_client_universal_timeout
- test_stdio_client_immediate_completion

await read_stream.aclose()
await write_stream.aclose()
await read_stream_writer.aclose()
await write_stream_reader.aclose()

These tests verify that stdio_client completes cleanup within reasonable
time for both slow-terminating and fast-exiting processes, preventing
the hanging issues reported in #559.

**NOTE: This test FAILS without the changes introduced in #555**

- test_stdio_client_sigint_only_process

try:
    process.terminate()
    with anyio.fail_after(2.0):
        await process.wait()
except TimeoutError:
    # If process doesn't terminate in time, force kill it
    process.kill()
felixweinberger added a commit that referenced this pull request Jun 26, 2025
**NOTE: These tests FAIL without the changes introduced in #559**

- test_stdio_client_universal_timeout
- test_stdio_client_immediate_completion

await read_stream.aclose()
await write_stream.aclose()
await read_stream_writer.aclose()
await write_stream_reader.aclose()

These tests verify that stdio_client completes cleanup within reasonable
time for both slow-terminating and fast-exiting processes, preventing
the hanging issues reported in #559.

**NOTE: This test FAILS without the changes introduced in #555**

- test_stdio_client_sigint_only_process

try:
    process.terminate()
    with anyio.fail_after(2.0):
        await process.wait()
except TimeoutError:
    # If process doesn't terminate in time, force kill it
    process.kill()

This test verifies that on UNIX systems MCP servers that don't respect
SIGTERM but e.g. SIGINT still get terminated after a grace period.
felixweinberger added a commit that referenced this pull request Jun 26, 2025
felixweinberger added a commit that referenced this pull request Jun 26, 2025
**NOTE: These tests FAIL without the changes introduced in #559**

- test_stdio_client_universal_timeout
- test_stdio_client_immediate_completion

await read_stream.aclose()
await write_stream.aclose()
await read_stream_writer.aclose()
await write_stream_reader.aclose()

These tests verify that stdio_client completes cleanup within reasonable
time for both slow-terminating and fast-exiting processes, preventing
the hanging issues reported in #559.

**NOTE: This test FAILS without the changes introduced in #555**

- test_stdio_client_sigint_only_process

try:
    process.terminate()
    with anyio.fail_after(2.0):
        await process.wait()
except TimeoutError:
    # If process doesn't terminate in time, force kill it
    process.kill()

This test verifies that on UNIX systems MCP servers that don't respect
SIGTERM but e.g. SIGINT still get terminated after a grace period.
Copy link
Contributor

@felixweinberger felixweinberger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @cristipufu, thank you for working on this and for submitting this PR!

Having looked into this in some detail I think your simplification makes sense:

I created a draft PR #1044 with 3 new regression tests in an attempt to actually reproduce the hanging resolved by both this change (#555) and #559.

Please take a look if that reflects what you were going for here - if yes could you rebase your PR here on latest main?

felixweinberger added a commit that referenced this pull request Jun 26, 2025
**NOTE: These tests FAIL without the changes introduced in #559**

- test_stdio_client_universal_timeout
- test_stdio_client_immediate_completion

await read_stream.aclose()
await write_stream.aclose()
await read_stream_writer.aclose()
await write_stream_reader.aclose()

These tests verify that stdio_client completes cleanup within reasonable
time for both slow-terminating and fast-exiting processes, preventing
the hanging issues reported in #559.

**NOTE: This test FAILS without the changes introduced in #555**

- test_stdio_client_sigint_only_process

try:
    process.terminate()
    with anyio.fail_after(2.0):
        await process.wait()
except TimeoutError:
    # If process doesn't terminate in time, force kill it
    process.kill()

This test verifies that on UNIX systems MCP servers that don't respect
SIGTERM but e.g. SIGINT still get terminated after a grace period.
felixweinberger added a commit that referenced this pull request Jun 26, 2025
**NOTE: These tests FAIL without the changes introduced in #559**

- test_stdio_client_universal_timeout
- test_stdio_client_immediate_completion

await read_stream.aclose()
await write_stream.aclose()
await read_stream_writer.aclose()
await write_stream_reader.aclose()

These tests verify that stdio_client completes cleanup within reasonable
time for both slow-terminating and fast-exiting processes, preventing
the hanging issues reported in #559.

**NOTE: This test FAILS without the changes introduced in #555**

- test_stdio_client_sigint_only_process

try:
    process.terminate()
    with anyio.fail_after(2.0):
        await process.wait()
except TimeoutError:
    # If process doesn't terminate in time, force kill it
    process.kill()

This test verifies that on UNIX systems MCP servers that don't respect
SIGTERM but e.g. SIGINT still get terminated after a grace period.
felixweinberger added a commit that referenced this pull request Jun 26, 2025
**NOTE: These tests FAIL without the changes introduced in #559**

- test_stdio_client_universal_timeout
- test_stdio_client_immediate_completion

await read_stream.aclose()
await write_stream.aclose()
await read_stream_writer.aclose()
await write_stream_reader.aclose()

These tests verify that stdio_client completes cleanup within reasonable
time for both slow-terminating and fast-exiting processes, preventing
the hanging issues reported in #559.

**NOTE: This test FAILS without the changes introduced in #555**

- test_stdio_client_sigint_only_process

try:
    process.terminate()
    with anyio.fail_after(2.0):
        await process.wait()
except TimeoutError:
    # If process doesn't terminate in time, force kill it
    process.kill()

This test verifies that on UNIX systems MCP servers that don't respect
SIGTERM but e.g. SIGINT still get terminated after a grace period.
felixweinberger added a commit that referenced this pull request Jun 26, 2025
**NOTE: These tests FAIL without the changes introduced in #559**

- test_stdio_client_universal_timeout
- test_stdio_client_immediate_completion

await read_stream.aclose()
await write_stream.aclose()
await read_stream_writer.aclose()
await write_stream_reader.aclose()

These tests verify that stdio_client completes cleanup within reasonable
time for both slow-terminating and fast-exiting processes, preventing
the hanging issues reported in #559.

**NOTE: This test FAILS without the changes introduced in #555**

- test_stdio_client_sigint_only_process

try:
    process.terminate()
    with anyio.fail_after(2.0):
        await process.wait()
except TimeoutError:
    # If process doesn't terminate in time, force kill it
    process.kill()

This test verifies that on UNIX systems MCP servers that don't respect
SIGTERM but e.g. SIGINT still get terminated after a grace period.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants