Skip to main content
Scrape functions extract data from HTML pages using CSS selectors, similar to Home Assistant’s scrape integration. They’re useful for websites without APIs.

Configuration

function:
  type: scrape
  resource: https://example.com
  value_template: "{{ value }}"
  sensor:
    - name: sensor_name
      select: ".css-selector"
      value_template: "{{ value }}"
resource
string
required
The URL of the web page to scrape
sensor
array
required
List of data points to extract from the page
value_template
string
required
Template that combines scraped sensor values into the function result

Sensor Configuration

Each sensor extracts a piece of data from the page:
name
string
required
Variable name to use in the main value_template
select
string
required
CSS selector to find the element
value_template
string
Template to process the selected element’s text
attribute
string
Extract an HTML attribute instead of text (e.g., href, src)

CSS Selectors

.classname        /* Select by class */
#idname          /* Select by ID */
tagname          /* Select by tag */
[attribute]      /* Select by attribute */
parent child           /* Descendant */
parent > child         /* Direct child */
element + sibling      /* Adjacent sibling */
element ~ sibling      /* General sibling */
[href^="https"]        /* Starts with */
[href$=".pdf"]         /* Ends with */
[href*="example"]      /* Contains */
[data-id="123"]        /* Exact match */
:first-child           /* First child */
:last-child            /* Last child */
:nth-child(2)          /* Nth child */
:not(.excluded)        /* Negation */

Advanced Features

Get HTML attributes instead of text:
sensor:
  - name: image_url
    select: "img.product-image"
    attribute: src
  - name: link
    select: "a.read-more"
    attribute: href
Extract all matching elements:
sensor:
  - name: all_prices
    select: ".price"
    index: all  # Returns a list
Add custom headers (e.g., for authentication):
function:
  type: scrape
  resource: https://example.com
  headers:
    User-Agent: "Mozilla/5.0"
    Cookie: "session=abc123"
  sensor:
    - name: data
      select: ".content"

Use Cases

Version Tracking

Monitor software versions and releases

Price Monitoring

Track product prices and availability

News Aggregation

Extract headlines and articles

Status Pages

Monitor service status pages

Best Practices

1

Inspect the page

Use browser DevTools to find the right CSS selectors. Right-click an element and select “Inspect” to see its HTML structure.
2

Handle missing elements

Pages may change. Use safe filters and defaults:
{{ value | default('Not found') }}
3

Respect robots.txt

Check the website’s robots.txt to ensure scraping is allowed. Add appropriate delays for rate limiting.
4

Add User-Agent

Some sites block requests without a User-Agent header:
headers:
  User-Agent: "Home Assistant Extended OpenAI Conversation"

Debugging

Test CSS selectors in browser console:
document.querySelector('.css-selector')
document.querySelectorAll('.css-selector')
Enable debug logging:
logger:
  logs:
    custom_components.extended_openai_conversation: debug

Examples

Get Home Assistant Version

Scrape the current version from home-assistant.io:
- spec:
    name: get_ha_version
    description: Use this function to get Home Assistant version
    parameters:
      type: object
      properties:
        dummy:
          type: string
          description: Not used (placeholder)
  function:
    type: scrape
    resource: https://www.home-assistant.io
    value_template: "version: {{version}}, release_date: {{release_date}}"
    sensor:
      - name: version
        select: ".current-version h1"
        value_template: '{{ value.split(":")[1] }}'
      - name: release_date
        select: ".release-date"
        value_template: '{{ value.lower() }}'
Scrape Example

Scrape Product Price

- spec:
    name: get_product_price
    description: Get current price of a product
    parameters:
      type: object
      properties:
        url:
          type: string
          description: Product page URL
      required:
      - url
  function:
    type: scrape
    resource: "{{ url }}"
    value_template: "Price: {{ price }}, In Stock: {{ stock }}"
    sensor:
      - name: price
        select: ".price-current"
        value_template: '{{ value | replace("$", "") | float }}'
      - name: stock
        select: ".stock-status"
        value_template: '{{ "yes" if "in stock" in value.lower() else "no" }}'

Extract Multiple Items

- spec:
    name: get_news_headlines
    description: Get latest news headlines
    parameters:
      type: object
      properties:
        dummy:
          type: string
  function:
    type: scrape
    resource: https://news.ycombinator.com
    value_template: >-
      Top Stories:
      {% for i in range(1, 6) %}
      {{ i }}. {{ headlines[i-1] }}
      {% endfor %}
    sensor:
      - name: headlines
        select: ".titleline > a"
        value_template: '{{ value }}'
        index: all

Comparison: Scrape vs REST

FeatureScrapeREST
Data formatHTMLJSON/XML
Requires APINoYes
Selector typeCSSJSON path
Best forPublic websitesAPIs with keys
MaintenanceHigher (HTML changes)Lower (stable APIs)

Next Steps

REST Functions

Use APIs when available

Composite Functions

Combine scraping with processing

Scrape Integration

HA scrape integration docs