Scrape functions extract data from HTML pages using CSS selectors, similar to Home Assistant’s scrape integration . They’re useful for websites without APIs.
Configuration
function :
type : scrape
resource : https://example.com
value_template : "{{ value }}"
sensor :
- name : sensor_name
select : ".css-selector"
value_template : "{{ value }}"
The URL of the web page to scrape
List of data points to extract from the page
Template that combines scraped sensor values into the function result
Sensor Configuration
Each sensor extracts a piece of data from the page:
Variable name to use in the main value_template
CSS selector to find the element
Template to process the selected element’s text
Extract an HTML attribute instead of text (e.g., href, src)
CSS Selectors
.classname /* Select by class */
#idname /* Select by ID */
tagname /* Select by tag */
[ attribute ] /* Select by attribute */
parent child /* Descendant */
parent > child /* Direct child */
element + sibling /* Adjacent sibling */
element ~ sibling /* General sibling */
[ href ^= "https" ] /* Starts with */
[ href $= ".pdf" ] /* Ends with */
[ href *= "example" ] /* Contains */
[ data-id = "123" ] /* Exact match */
:first-child /* First child */
:last-child /* Last child */
:nth-child ( 2 ) /* Nth child */
:not ( .excluded ) /* Negation */
Advanced Features
Extract all matching elements: sensor :
- name : all_prices
select : ".price"
index : all # Returns a list
Use Cases
Version Tracking Monitor software versions and releases
Price Monitoring Track product prices and availability
News Aggregation Extract headlines and articles
Status Pages Monitor service status pages
Best Practices
Inspect the page
Use browser DevTools to find the right CSS selectors. Right-click an element and select “Inspect” to see its HTML structure.
Handle missing elements
Pages may change. Use safe filters and defaults: {{ value | default('Not found') }}
Respect robots.txt
Check the website’s robots.txt to ensure scraping is allowed. Add appropriate delays for rate limiting.
Add User-Agent
Some sites block requests without a User-Agent header: headers :
User-Agent : "Home Assistant Extended OpenAI Conversation"
Debugging
Test CSS selectors in browser console:
document . querySelector ( '.css-selector' )
document . querySelectorAll ( '.css-selector' )
Enable debug logging:
logger :
logs :
custom_components.extended_openai_conversation : debug
Examples
Get Home Assistant Version
Scrape the current version from home-assistant.io:
- spec :
name : get_ha_version
description : Use this function to get Home Assistant version
parameters :
type : object
properties :
dummy :
type : string
description : Not used (placeholder)
function :
type : scrape
resource : https://www.home-assistant.io
value_template : "version: {{version}}, release_date: {{release_date}}"
sensor :
- name : version
select : ".current-version h1"
value_template : '{{ value.split(":")[1] }}'
- name : release_date
select : ".release-date"
value_template : '{{ value.lower() }}'
Scrape Product Price
- spec :
name : get_product_price
description : Get current price of a product
parameters :
type : object
properties :
url :
type : string
description : Product page URL
required :
- url
function :
type : scrape
resource : "{{ url }}"
value_template : "Price: {{ price }}, In Stock: {{ stock }}"
sensor :
- name : price
select : ".price-current"
value_template : '{{ value | replace("$", "") | float }}'
- name : stock
select : ".stock-status"
value_template : '{{ "yes" if "in stock" in value.lower() else "no" }}'
- spec :
name : get_news_headlines
description : Get latest news headlines
parameters :
type : object
properties :
dummy :
type : string
function :
type : scrape
resource : https://news.ycombinator.com
value_template : > -
Top Stories:
{% for i in range(1, 6) %}
{{ i }}. {{ headlines[i-1] }}
{% endfor %}
sensor :
- name : headlines
select : ".titleline > a"
value_template : '{{ value }}'
index : all
Comparison: Scrape vs REST
Feature Scrape REST Data format HTML JSON/XML Requires API No Yes Selector type CSS JSON path Best for Public websites APIs with keys Maintenance Higher (HTML changes) Lower (stable APIs)
Next Steps