> ## Documentation Index
> Fetch the complete documentation index at: https://phidatainc-redirect-agent-platform-overview.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

# ScrapeGraph

> ScrapeGraphTools enable an Agent to extract structured data from webpages, convert content to markdown, and retrieve raw HTML content.

**ScrapeGraphTools** enable an Agent to extract structured data from webpages, convert content to markdown, and retrieve raw HTML content using the ScrapeGraphAI API.

The toolkit provides 5 core capabilities:

1. **smartscraper**: Extract structured data using natural language prompts
2. **markdownify**: Convert web pages to markdown format
3. **searchscraper**: Search the web and extract information
4. **crawl**: Crawl websites with structured data extraction
5. **scrape**: Get raw HTML content from websites *(NEW!)*

The scrape method is particularly useful when you need:

* Complete HTML source code
* Raw content for further processing
* HTML structure analysis
* Content that needs to be parsed differently

All methods support heavy JavaScript rendering when needed.

## Prerequisites

The following examples require the `scrapegraph-py` library.

```shell theme={null}
uv pip install -U scrapegraph-py
```

Optionally, if your ScrapeGraph configuration or specific models require an API key, set the `SGAI_API_KEY` environment variable:

```shell theme={null}
export SGAI_API_KEY="YOUR_SGAI_API_KEY"
```

## Example

The following agent will extract structured data from a website using the smartscraper tool:

```python cookbook/14_tools/scrapegraph_tools.py theme={null}
from agno.agent import Agent
from agno.models.openai import OpenAIResponses
from agno.tools.scrapegraph import ScrapeGraphTools

agent_model = OpenAIResponses(id="gpt-5.2")
scrapegraph_smartscraper = ScrapeGraphTools(enable_smartscraper=True)

agent = Agent(
    tools=[scrapegraph_smartscraper], model=agent_model, markdown=True, stream=True
)

agent.print_response("""
Use smartscraper to extract the following from https://www.wired.com/category/science/:
- News articles
- Headlines
- Images
- Links
- Author
""")
```

### Raw HTML Scraping

Get complete HTML content from websites for custom processing:

```python cookbook/14_tools/scrapegraph_tools.py theme={null}
# Enable scrape method for raw HTML content
scrapegraph_scrape = ScrapeGraphTools(enable_scrape=True, enable_smartscraper=False)

scrape_agent = Agent(
    tools=[scrapegraph_scrape],
    model=agent_model,
    markdown=True,
    stream=True,
)

scrape_agent.print_response(
    "Use the scrape tool to get the complete raw HTML content from https://en.wikipedia.org/wiki/2025_FIFA_Club_World_Cup"
)
```

### All Functions with JavaScript Rendering

Enable all ScrapeGraph functions with heavy JavaScript support:

```python cookbook/14_tools/scrapegraph_tools.py theme={null}
# Enable all ScrapeGraph functions
scrapegraph_all = Agent(
    tools=[
        ScrapeGraphTools(all=True, render_heavy_js=True)
    ],  # render_heavy_js=True scrapes all JavaScript
    model=agent_model,
    markdown=True,
    stream=True,
)

scrapegraph_all.print_response("""
Use any appropriate scraping method to extract comprehensive information from https://www.wired.com/category/science/:
- News articles and headlines
- Convert to markdown if needed
- Search for specific information
""")
```

<Note>View the [Startup Analyst example](/cookbook/agents/startup-analyst-agent) </Note>

## Toolkit Params

| Parameter                | Type            | Default | Description                                                                                        |
| ------------------------ | --------------- | ------- | -------------------------------------------------------------------------------------------------- |
| `api_key`                | `Optional[str]` | `None`  | ScrapeGraph API key. If not provided, uses SGAI\_API\_KEY environment variable.                    |
| `enable_smartscraper`    | `bool`          | `True`  | Enable the smartscraper function for LLM-powered data extraction.                                  |
| `enable_markdownify`     | `bool`          | `False` | Enable the markdownify function for webpage to markdown conversion.                                |
| `enable_crawl`           | `bool`          | `False` | Enable the crawl function for website crawling and data extraction.                                |
| `enable_searchscraper`   | `bool`          | `False` | Enable the searchscraper function for web search and information extraction.                       |
| `enable_agentic_crawler` | `bool`          | `False` | Enable the agentic\_crawler function for automated browser actions and AI extraction.              |
| `enable_scrape`          | `bool`          | `False` | Enable the scrape function for retrieving raw HTML content from websites.                          |
| `render_heavy_js`        | `bool`          | `False` | Enable heavy JavaScript rendering for all scraping functions. Useful for SPAs and dynamic content. |
| `all`                    | `bool`          | `False` | Enable all available functions. When True, all enable flags are ignored.                           |

## Toolkit Functions

| Function          | Description                                                                                                                                                                                                            |
| ----------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `smartscraper`    | Extract structured data from a webpage using LLM and natural language prompt. Parameters: url (str), prompt (str).                                                                                                     |
| `markdownify`     | Convert a webpage to markdown format. Parameters: url (str).                                                                                                                                                           |
| `crawl`           | Crawl a website and extract structured data. Parameters: url (str), prompt (str), data\_schema (dict), cache\_website (bool), depth (int), max\_pages (int), same\_domain\_only (bool), batch\_size (int).             |
| `searchscraper`   | Search the web and extract information. Parameters: user\_prompt (str).                                                                                                                                                |
| `agentic_crawler` | Perform automated browser actions with optional AI extraction. Parameters: url (str), steps (List\[str]), use\_session (bool), user\_prompt (Optional\[str]), output\_schema (Optional\[dict]), ai\_extraction (bool). |
| `scrape`          | Get raw HTML content from a website. Useful for complete source code retrieval and custom processing. Parameters: website\_url (str), headers (Optional\[dict]).                                                       |

## Developer Resources

* View [Tools](https://github.com/agno-agi/agno/blob/main/libs/agno/agno/tools/scrapegraph.py)
* View [Tests](https://github.com/agno-agi/agno/blob/main/libs/agno/tests/unit/tools/test_scrapegraph.py)
