================== /// MCP /// /// OMN /// ================== [server:online] [protocol:ready]
omniparser-autogui-mcp
by NON906
Python-based MCP server that uses Microsoft OmniParser to visually parse the active screen and then drives the GUI automatically (mouse / keyboard). Primarily targeted at Windows but can run cross-platform.
51
9
Open SourceInstallation
1. Prerequisites
• Python ≥3.9 installed and added to PATH
• Windows/macOS/Linux desktop session (the server controls the local GUI)
2. Clone the repository
git clone https://github.com/NON906/omniparser-autogui-mcp.git
cd omniparser-autogui-mcp
3. Install Python dependencies
# If a requirements.txt file exists
pip install -r requirements.txt
# —or— build an editable install
pip install -e .
4. Configuration (optional)
• Create .env in the project root to override defaults such as server port, screen-capture interval, etc.
• If headless execution is required on Linux, start a virtual display (e.g., xvfb-run python server.py)
Example:
MCP_PORT=8765
MCP_AUTH_TOKEN=your-long-token
5. Launch the MCP server
python server.py # or the main entrypoint (check README/cli.py)
6. Connect a client
Point any MCP-compatible client at ws://<host>:<MCP_PORT> using the auth token you configured.
Documentation
License: MIT License
Updated 7/30/2025