Scrape Functions

Scrape functions extract data from HTML pages using CSS selectors, similar to Home Assistant’s scrape integration. They’re useful for websites without APIs.

Configuration

function:
  type: scrape
  resource: https://example.com
  value_template: "{{ value }}"
  sensor:
    - name: sensor_name
      select: ".css-selector"
      value_template: "{{ value }}"

resource

string

required

The URL of the web page to scrape

sensor

array

required

List of data points to extract from the page

value_template

string

required

Template that combines scraped sensor values into the function result

Sensor Configuration

Each sensor extracts a piece of data from the page:

name

string

required

Variable name to use in the main value_template

select

string

required

CSS selector to find the element

value_template

string

Template to process the selected element’s text

attribute

string

Extract an HTML attribute instead of text (e.g., href, src)

CSS Selectors

Basic Selectors

.classname        /* Select by class */
#idname          /* Select by ID */
tagname          /* Select by tag */
[attribute]      /* Select by attribute */

Combinators

parent child           /* Descendant */
parent > child         /* Direct child */
element + sibling      /* Adjacent sibling */
element ~ sibling      /* General sibling */

Attribute Selectors

[href^="https"]        /* Starts with */
[href$=".pdf"]         /* Ends with */
[href*="example"]      /* Contains */
[data-id="123"]        /* Exact match */

Pseudo-classes

:first-child           /* First child */
:last-child            /* Last child */
:nth-child(2)          /* Nth child */
:not(.excluded)        /* Negation */

Advanced Features

Extract Attributes

Get HTML attributes instead of text:

sensor:
  - name: image_url
    select: "img.product-image"
    attribute: src
  - name: link
    select: "a.read-more"
    attribute: href

Multiple Elements

Extract all matching elements:

sensor:
  - name: all_prices
    select: ".price"
    index: all  # Returns a list

Headers

Add custom headers (e.g., for authentication):

function:
  type: scrape
  resource: https://example.com
  headers:
    User-Agent: "Mozilla/5.0"
    Cookie: "session=abc123"
  sensor:
    - name: data
      select: ".content"

Use Cases

Version Tracking

Monitor software versions and releases

Price Monitoring

Track product prices and availability

News Aggregation

Extract headlines and articles

Status Pages

Monitor service status pages

Best Practices

Inspect the page

Use browser DevTools to find the right CSS selectors. Right-click an element and select “Inspect” to see its HTML structure.

Handle missing elements

Pages may change. Use safe filters and defaults:

{{ value | default('Not found') }}

Respect robots.txt

Check the website’s robots.txt to ensure scraping is allowed. Add appropriate delays for rate limiting.

Add User-Agent

Some sites block requests without a User-Agent header:

headers:
  User-Agent: "Home Assistant Extended OpenAI Conversation"

Debugging

Test CSS selectors in browser console:

document.querySelector('.css-selector')
document.querySelectorAll('.css-selector')

Enable debug logging:

logger:
  logs:
    custom_components.extended_openai_conversation: debug

Examples

Get Home Assistant Version

Scrape the current version from home-assistant.io:

- spec:
    name: get_ha_version
    description: Use this function to get Home Assistant version
    parameters:
      type: object
      properties:
        dummy:
          type: string
          description: Not used (placeholder)
  function:
    type: scrape
    resource: https://www.home-assistant.io
    value_template: "version: {{version}}, release_date: {{release_date}}"
    sensor:
      - name: version
        select: ".current-version h1"
        value_template: '{{ value.split(":")[1] }}'
      - name: release_date
        select: ".release-date"
        value_template: '{{ value.lower() }}'

Scrape Product Price

- spec:
    name: get_product_price
    description: Get current price of a product
    parameters:
      type: object
      properties:
        url:
          type: string
          description: Product page URL
      required:
      - url
  function:
    type: scrape
    resource: "{{ url }}"
    value_template: "Price: {{ price }}, In Stock: {{ stock }}"
    sensor:
      - name: price
        select: ".price-current"
        value_template: '{{ value | replace("$", "") | float }}'
      - name: stock
        select: ".stock-status"
        value_template: '{{ "yes" if "in stock" in value.lower() else "no" }}'

Extract Multiple Items

- spec:
    name: get_news_headlines
    description: Get latest news headlines
    parameters:
      type: object
      properties:
        dummy:
          type: string
  function:
    type: scrape
    resource: https://news.ycombinator.com
    value_template: >-
      Top Stories:
      {% for i in range(1, 6) %}
      {{ i }}. {{ headlines[i-1] }}
      {% endfor %}
    sensor:
      - name: headlines
        select: ".titleline > a"
        value_template: '{{ value }}'
        index: all

Comparison: Scrape vs REST

Feature	Scrape	REST
Data format	HTML	JSON/XML
Requires API	No	Yes
Selector type	CSS	JSON path
Best for	Public websites	APIs with keys
Maintenance	Higher (HTML changes)	Lower (stable APIs)

Next Steps

REST Functions

Use APIs when available

Composite Functions

Combine scraping with processing

Scrape Integration

HA scrape integration docs

Getting Started

Functions

Skills

Configuration

Sensor Configuration

CSS Selectors

Advanced Features

Use Cases

Version Tracking

Price Monitoring

News Aggregation

Status Pages

Best Practices

Debugging

Examples

Get Home Assistant Version

Scrape Product Price

Extract Multiple Items

Comparison: Scrape vs REST

Next Steps

REST Functions

Composite Functions

Scrape Integration

Getting Started

Functions

Skills

​Configuration

​Sensor Configuration

​CSS Selectors

​Advanced Features

​Use Cases

Version Tracking

Price Monitoring

News Aggregation

Status Pages

​Best Practices

​Debugging

​Examples

​Get Home Assistant Version

​Scrape Product Price

​Extract Multiple Items

​Comparison: Scrape vs REST

​Next Steps

REST Functions

Composite Functions

Scrape Integration

Configuration

Sensor Configuration

CSS Selectors

Advanced Features

Use Cases

Best Practices

Debugging

Examples

Get Home Assistant Version

Scrape Product Price

Extract Multiple Items

Comparison: Scrape vs REST

Next Steps