Skip to main content
Bright Data provides a powerful Web Scraper API that allows you to extract structured data from 44 popular domains, including e-commerce sites (Amazon, Walmart, eBay), social media (LinkedIn, Instagram, TikTok, Facebook), and more, making it particularly useful for AI agents requiring reliable structured web data feeds.

Overview

Integration details

ClassPackageSerializableJS supportVersion
BrightDataWebScraperAPIlangchain-brightdataPyPI - Version

Tool features

Native asyncReturns artifactReturn dataPricing
Structured data from websites (Amazon products, LinkedIn profiles, etc.)Requires Bright Data account

Setup

The integration lives in the langchain-brightdata package.
pip install langchain-brightdata
You’ll need a Bright Data API key to use this tool. You can set it as an environment variable:
import os

os.environ["BRIGHT_DATA_API_KEY"] = "your-api-key"
Or pass it directly when initializing the tool:
from langchain_brightdata import BrightDataWebScraperAPI

scraper_tool = BrightDataWebScraperAPI(bright_data_api_key="your-api-key")

Instantiation

Here we show how to instantiate an instance of the BrightDataWebScraperAPI tool. This tool allows you to extract structured data from various websites including Amazon product details, LinkedIn profiles, and more using Bright Data’s Dataset API. The tool accepts the following parameter during instantiation:
  • bright_data_api_key (required, str): Your Bright Data API key for authentication.

Invocation

Basic Usage

from langchain_brightdata import BrightDataWebScraperAPI

# Initialize the tool
scraper_tool = BrightDataWebScraperAPI(
    bright_data_api_key="your-api-key"  # Optional if set in environment variables
)

# Extract Amazon product data
results = scraper_tool.invoke(
    {"url": "https://www.amazon.com/dp/B08L5TNJHG", "dataset_type": "amazon_product"}
)

print(results)

Advanced Usage with Parameters

from langchain_brightdata import BrightDataWebScraperAPI

# Initialize with default parameters
scraper_tool = BrightDataWebScraperAPI(bright_data_api_key="your-api-key")

# Extract Amazon product data with location-specific pricing
results = scraper_tool.invoke(
    {
        "url": "https://www.amazon.com/dp/B08L5TNJHG",
        "dataset_type": "amazon_product",
        "zipcode": "10001",  # Get pricing for New York City
    }
)

print(results)

# Extract LinkedIn profile data
linkedin_results = scraper_tool.invoke(
    {
        "url": "https://www.linkedin.com/in/satyanadella/",
        "dataset_type": "linkedin_person_profile",
    }
)

print(linkedin_results)

Customization Options

The BrightDataWebScraperAPI tool accepts several parameters for customization:
ParameterTypeDescription
urlstrThe URL to extract data from
dataset_typestrType of dataset to use (see available types below)
zipcodestrOptional zipcode for location-specific data
keywordstrSearch keyword (required for amazon_product_search)
first_namestrFirst name (required for linkedin_people_search)
last_namestrLast name (required for linkedin_people_search)
num_of_reviewsstrNumber of reviews (required for facebook_company_reviews)
num_of_commentsstrNumber of comments (for youtube_comments, default: 10)
days_limitstrDays to limit results (for google_maps_reviews, default: 3)

Available dataset types (44 datasets)

E-commerce (10 datasets)

Dataset TypeDescriptionRequired Inputs
amazon_productProduct details, pricing, specsurl (with /dp/)
amazon_product_reviewsCustomer reviews and ratingsurl (with /dp/)
amazon_product_searchSearch results from Amazonkeyword, url
walmart_productWalmart product dataurl (with /ip/)
walmart_sellerWalmart seller informationurl
ebay_producteBay product dataurl
homedepot_productsHome Depot product dataurl
zara_productsZara product dataurl
etsy_productsEtsy product dataurl
bestbuy_productsBest Buy product dataurl

LinkedIn (5 datasets)

Dataset TypeDescriptionRequired Inputs
linkedin_person_profileProfessional profile dataurl
linkedin_company_profileCompany informationurl
linkedin_job_listingsJob listing detailsurl
linkedin_postsPost content and engagementurl
linkedin_people_searchSearch for peopleurl, first_name, last_name

Business intelligence (2 datasets)

Dataset TypeDescriptionRequired Inputs
crunchbase_companyCompany funding, investors, metricsurl
zoominfo_company_profileB2B company intelligenceurl

Instagram (4 datasets)

Dataset TypeDescriptionRequired Inputs
instagram_profilesProfile data and statsurl
instagram_postsPost content and engagementurl
instagram_reelsReel content and metricsurl
instagram_commentsComments on postsurl

Facebook (4 datasets)

Dataset TypeDescriptionRequired Inputs
facebook_postsPost content and engagementurl
facebook_marketplace_listingsMarketplace listing dataurl
facebook_company_reviewsCompany reviewsurl, num_of_reviews
facebook_eventsEvent detailsurl

TikTok (4 datasets)

Dataset TypeDescriptionRequired Inputs
tiktok_profilesProfile data and statsurl
tiktok_postsVideo content and metricsurl
tiktok_shopShop product dataurl
tiktok_commentsComments on videosurl

YouTube (3 datasets)

Dataset TypeDescriptionRequired Inputs
youtube_profilesChannel profile dataurl
youtube_videosVideo content and metricsurl
youtube_commentsComments on videosurl, num_of_comments (default: 10)

Google (3 datasets)

Dataset TypeDescriptionRequired Inputs
google_maps_reviewsBusiness reviews from Mapsurl, days_limit (default: 3)
google_shoppingShopping product dataurl
google_play_storeApp store dataurl

Other platforms (9 datasets)

Dataset TypeDescriptionRequired Inputs
apple_app_storeiOS app dataurl
x_postsX (Twitter) post dataurl
reddit_postsReddit post dataurl
github_repository_fileGitHub file contenturl
yahoo_finance_businessFinancial business dataurl
reuter_newsNews article dataurl
zillow_properties_listingReal estate listing dataurl
booking_hotel_listingsHotel listing dataurl

Use within an agent

from langchain_brightdata import BrightDataWebScraperAPI
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.agents import create_agent


# Initialize the LLM
llm = ChatGoogleGenerativeAI(model="gemini-2.5-flash", google_api_key="your-api-key")

# Initialize the Bright Data Web Scraper API tool
scraper_tool = BrightDataWebScraperAPI(bright_data_api_key="your-api-key")

# Create the agent with the tool
agent = create_agent(llm, [scraper_tool])

# Provide a user query
user_input = "Scrape Amazon product data for https://www.amazon.com/dp/B0D2Q9397Y?th=1 in New York (zipcode 10001)."

# Stream the agent's step-by-step output
for step in agent.stream(
    {"messages": user_input},
    stream_mode="values",
):
    step["messages"][-1].pretty_print()

API reference


Connect these docs to Claude, VSCode, and more via MCP for real-time answers.