BrightDataWebScraperAPI

Bright Data provides a powerful Web Scraper API that allows you to extract structured data from 44 popular domains, including e-commerce sites (Amazon, Walmart, eBay), social media (LinkedIn, Instagram, TikTok, Facebook), and more, making it particularly useful for AI agents requiring reliable structured web data feeds.

Overview

Integration details

Class	Package	Serializable	JS support	Version
BrightDataWebScraperAPI	langchain-brightdata	✅	❌

Tool features

Native async	Returns artifact	Return data	Pricing
❌	❌	Structured data from websites (Amazon products, LinkedIn profiles, etc.)	Requires Bright Data account

Setup

The integration lives in the langchain-brightdata package.

pip install langchain-brightdata

You’ll need a Bright Data API key to use this tool. You can set it as an environment variable:

import os

os.environ["BRIGHT_DATA_API_KEY"] = "your-api-key"

Or pass it directly when initializing the tool:

from langchain_brightdata import BrightDataWebScraperAPI

scraper_tool = BrightDataWebScraperAPI(bright_data_api_key="your-api-key")

Instantiation

Here we show how to instantiate an instance of the BrightDataWebScraperAPI tool. This tool allows you to extract structured data from various websites including Amazon product details, LinkedIn profiles, and more using Bright Data’s Dataset API. The tool accepts the following parameter during instantiation:

bright_data_api_key (required, str): Your Bright Data API key for authentication.

Invocation

Basic Usage

from langchain_brightdata import BrightDataWebScraperAPI

# Initialize the tool
scraper_tool = BrightDataWebScraperAPI(
    bright_data_api_key="your-api-key"  # Optional if set in environment variables
)

# Extract Amazon product data
results = scraper_tool.invoke(
    {"url": "https://www.amazon.com/dp/B08L5TNJHG", "dataset_type": "amazon_product"}
)

print(results)

Advanced Usage with Parameters

from langchain_brightdata import BrightDataWebScraperAPI

# Initialize with default parameters
scraper_tool = BrightDataWebScraperAPI(bright_data_api_key="your-api-key")

# Extract Amazon product data with location-specific pricing
results = scraper_tool.invoke(
    {
        "url": "https://www.amazon.com/dp/B08L5TNJHG",
        "dataset_type": "amazon_product",
        "zipcode": "10001",  # Get pricing for New York City
    }
)

print(results)

# Extract LinkedIn profile data
linkedin_results = scraper_tool.invoke(
    {
        "url": "https://www.linkedin.com/in/satyanadella/",
        "dataset_type": "linkedin_person_profile",
    }
)

print(linkedin_results)

Customization Options

The BrightDataWebScraperAPI tool accepts several parameters for customization:

Parameter	Type	Description
`url`	str	The URL to extract data from
`dataset_type`	str	Type of dataset to use (see available types below)
`zipcode`	str	Optional zipcode for location-specific data
`keyword`	str	Search keyword (required for `amazon_product_search`)
`first_name`	str	First name (required for `linkedin_people_search`)
`last_name`	str	Last name (required for `linkedin_people_search`)
`num_of_reviews`	str	Number of reviews (required for `facebook_company_reviews`)
`num_of_comments`	str	Number of comments (for `youtube_comments`, default: 10)
`days_limit`	str	Days to limit results (for `google_maps_reviews`, default: 3)

Available dataset types (44 datasets)

E-commerce (10 datasets)

Dataset Type	Description	Required Inputs
`amazon_product`	Product details, pricing, specs	`url` (with /dp/)
`amazon_product_reviews`	Customer reviews and ratings	`url` (with /dp/)
`amazon_product_search`	Search results from Amazon	`keyword`, `url`
`walmart_product`	Walmart product data	`url` (with /ip/)
`walmart_seller`	Walmart seller information	`url`
`ebay_product`	eBay product data	`url`
`homedepot_products`	Home Depot product data	`url`
`zara_products`	Zara product data	`url`
`etsy_products`	Etsy product data	`url`
`bestbuy_products`	Best Buy product data	`url`

LinkedIn (5 datasets)

Dataset Type	Description	Required Inputs
`linkedin_person_profile`	Professional profile data	`url`
`linkedin_company_profile`	Company information	`url`
`linkedin_job_listings`	Job listing details	`url`
`linkedin_posts`	Post content and engagement	`url`
`linkedin_people_search`	Search for people	`url`, `first_name`, `last_name`

Business intelligence (2 datasets)

Dataset Type	Description	Required Inputs
`crunchbase_company`	Company funding, investors, metrics	`url`
`zoominfo_company_profile`	B2B company intelligence	`url`

Instagram (4 datasets)

Dataset Type	Description	Required Inputs
`instagram_profiles`	Profile data and stats	`url`
`instagram_posts`	Post content and engagement	`url`
`instagram_reels`	Reel content and metrics	`url`
`instagram_comments`	Comments on posts	`url`

Facebook (4 datasets)

Dataset Type	Description	Required Inputs
`facebook_posts`	Post content and engagement	`url`
`facebook_marketplace_listings`	Marketplace listing data	`url`
`facebook_company_reviews`	Company reviews	`url`, `num_of_reviews`
`facebook_events`	Event details	`url`

TikTok (4 datasets)

Dataset Type	Description	Required Inputs
`tiktok_profiles`	Profile data and stats	`url`
`tiktok_posts`	Video content and metrics	`url`
`tiktok_shop`	Shop product data	`url`
`tiktok_comments`	Comments on videos	`url`

YouTube (3 datasets)

Dataset Type	Description	Required Inputs
`youtube_profiles`	Channel profile data	`url`
`youtube_videos`	Video content and metrics	`url`
`youtube_comments`	Comments on videos	`url`, `num_of_comments` (default: 10)

Google (3 datasets)

Dataset Type	Description	Required Inputs
`google_maps_reviews`	Business reviews from Maps	`url`, `days_limit` (default: 3)
`google_shopping`	Shopping product data	`url`
`google_play_store`	App store data	`url`

Other platforms (9 datasets)

Dataset Type	Description	Required Inputs
`apple_app_store`	iOS app data	`url`
`x_posts`	X (Twitter) post data	`url`
`reddit_posts`	Reddit post data	`url`
`github_repository_file`	GitHub file content	`url`
`yahoo_finance_business`	Financial business data	`url`
`reuter_news`	News article data	`url`
`zillow_properties_listing`	Real estate listing data	`url`
`booking_hotel_listings`	Hotel listing data	`url`

Use within an agent

from langchain_brightdata import BrightDataWebScraperAPI
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.agents import create_agent


# Initialize the LLM
llm = ChatGoogleGenerativeAI(model="gemini-2.5-flash", google_api_key="your-api-key")

# Initialize the Bright Data Web Scraper API tool
scraper_tool = BrightDataWebScraperAPI(bright_data_api_key="your-api-key")

# Create the agent with the tool
agent = create_agent(llm, [scraper_tool])

# Provide a user query
user_input = "Scrape Amazon product data for https://www.amazon.com/dp/B0D2Q9397Y?th=1 in New York (zipcode 10001)."

# Stream the agent's step-by-step output
for step in agent.stream(
    {"messages": user_input},
    stream_mode="values",
):
    step["messages"][-1].pretty_print()

API reference

Bright Data API Documentation

Edit this page on GitHub or file an issue.

Connect these docs to Claude, VSCode, and more via MCP for real-time answers.

Popular Providers

Integrations by component

Overview

Integration details

Tool features

Setup

Instantiation

Invocation

Basic Usage

Advanced Usage with Parameters

Customization Options

Available dataset types (44 datasets)

E-commerce (10 datasets)

LinkedIn (5 datasets)

Business intelligence (2 datasets)

Instagram (4 datasets)

Facebook (4 datasets)

TikTok (4 datasets)

YouTube (3 datasets)

Google (3 datasets)

Other platforms (9 datasets)

Use within an agent

API reference

Popular Providers

Integrations by component

​Overview

​Integration details

​Tool features

​Setup

​Instantiation

​Invocation

​Basic Usage

​Advanced Usage with Parameters

​Customization Options

​Available dataset types (44 datasets)

​E-commerce (10 datasets)

​LinkedIn (5 datasets)

​Business intelligence (2 datasets)

​Instagram (4 datasets)

​Facebook (4 datasets)

​TikTok (4 datasets)

​YouTube (3 datasets)

​Google (3 datasets)

​Other platforms (9 datasets)

​Use within an agent

​API reference

Overview

Integration details

Tool features

Setup

Instantiation

Invocation

Basic Usage

Advanced Usage with Parameters

Customization Options

Available dataset types (44 datasets)

E-commerce (10 datasets)

LinkedIn (5 datasets)

Business intelligence (2 datasets)

Instagram (4 datasets)

Facebook (4 datasets)

TikTok (4 datasets)

YouTube (3 datasets)

Google (3 datasets)

Other platforms (9 datasets)

Use within an agent

API reference