WebScraper class
A web scraper that uses proxies to avoid detection and blocking
- Available extensions
Constructors
-
WebScraper({required ProxyManager proxyManager, ProxyHttpClient? httpClient, String? defaultUserAgent, Map<
String, String> ? defaultHeaders, int defaultTimeout = 30000, int maxRetries = 3, AdaptiveScrapingStrategy? adaptiveStrategy, SiteReputationTracker? reputationTracker, ScrapingLogger? logger}) - Creates a new WebScraper with the given parameters
Properties
- hashCode → int
-
The hash code for this object.
no setterinherited
- logger → ScrapingLogger
-
Gets the scraping logger
no setter
- proxyManager → ProxyManager
-
The proxy manager for getting proxies
final
- reputationTracker → SiteReputationTracker
-
Gets the site reputation tracker
no setter
- runtimeType → Type
-
A representation of the runtime type of the object.
no setterinherited
Methods
-
close(
) → void - Closes the HTTP client
-
extractData(
{required String html, required String selector, String? attribute, bool asText = true}) → List< String> - Parses HTML content and extracts data using CSS selectors
-
extractStructuredData(
{required String html, required Map< String, String> selectors, Map<String, String?> ? attributes}) → List<Map< String, String> > - Parses HTML content and extracts structured data using CSS selectors
-
fetchFromProblematicSite(
{required String url, Map< String, String> ? headers, int? timeout = 60000, int? retries = 5}) → Future<String> -
Available on WebScraper, provided by the WebScraperExtension extension
Fetches HTML content from a problematic site using specialized techniques -
fetchHtml(
{required String url, Map< String, String> ? headers, int? timeout, int? retries}) → Future<String> - Fetches HTML content from the given URL
-
fetchHtmlWithRetry(
{required String url, Map< String, String> ? headers, int? timeout, int? retries, int initialBackoffMs = 500, double backoffMultiplier = 1.5, int maxBackoffMs = 10000}) → Future<String> -
Available on WebScraper, provided by the WebScraperExtension extension
Fetches HTML content with enhanced error handling and retry logic -
fetchJson(
{required String url, Map< String, String> ? headers, int? timeout, int? retries}) → Future<Map< String, dynamic> > - Fetches JSON content from the given URL
-
noSuchMethod(
Invocation invocation) → dynamic -
Invoked when a nonexistent method or property is accessed.
inherited
-
toString(
) → String -
A string representation of this object.
inherited
Operators
-
operator ==(
Object other) → bool -
The equality operator.
inherited