Colab Setup and Async Isolation for Reliable Launches

Install CloakBrowser via pip install cloakbrowser playwright pandas beautifulsoup4, then playwright install-deps chromium for runtime dependencies. Prepare stealth binary with ensure_binary() and verify via binary_info(). Colab's existing asyncio loop blocks Playwright sync APIs like launch(), launch_context(), launch_persistent_context()—wrap them in ThreadPoolExecutor to run in a separate thread: executor.submit(fn).result(). This enables headless launches with headless=True, humanize=True (anti-detection), and args like --no-sandbox --disable-dev-shm-usage. Working dir /content/cloakbrowser_advanced_tutorial stores screenshots, storage_state.json, and profile dirs.

Basic launch: browser = launch(...); page.goto('https://example.com', wait_until='domcontentloaded', timeout=60000) extracts title, body preview:300, URL. Always safe_close() in finally blocks to avoid leaks.

Custom Contexts for Realistic Browser Simulation

Use launch_context(headless=True, humanize=True, viewport={'width':1365,'height':768}, locale='en-US', timezone_id='America/New_York', color_scheme='light', extra_http_headers={'Accept-Language':'en-US,en;q=0.9', 'X-Tutorial-Run':'cloakbrowser-colab'}). Navigate to data:URL test pages for safe interaction: fill form #name="CloakBrowser Colab User", #message="We are testing...", click #submit, wait_for_timeout(1000). Save context.storage_state(path='storage_state.json'); screenshot full_page=True to PNG.

Restore in new context: launch_context(..., storage_state='storage_state.json'); verify localStorage like tutorial_name persists via page.evaluate("() => localStorage.getItem('tutorial_name')"). Demonstrates session continuity without full profile overhead.

Persistent Profiles Across Restarts

launch_persistent_context(str(PROFILE_DIR), ...) creates dir-based profiles surviving ctx.close() and relaunches. First run: page.evaluate("localStorage.setItem('persistent_profile_demo', 'saved_across_browser_restarts')"); second run confirms value and timestamp new Date().toISOString() match, proving persisted_successfully: true. Use viewport=1280x720 for persistence demo. Clear dir with shutil.rmtree(PROFILE_DIR) before tests. Profiles handle localStorage automatically, ideal for long-running automations.

Stealth Signal Inspection and Content Extraction

Test page JavaScript collects 15+ signals: navigator.webdriver (false for stealth), userAgent, platform, languages, hardwareConcurrency, deviceMemory, pluginsLength, chromeObjectPresent:true, timezone, screen:{width,height,colorDepth=24,pixelDepth=24}, viewport:{innerWidth,innerHeight,devicePixelRatio}, webglVendor/Renderer (masked), localStorageWorks:true. Extract via page.evaluate('() => collectSignals()').

Capture rendered content: page.title(), locator('h1').inner_text(timeout=15000), page.content(). Parse static HTML with BeautifulSoup: soup.title.get_text(), soup.find('h1'), links list [{text,href}]. Compare rendered vs static reveals JS effects. Pandas table summarizes: signals (e.g., webdriver=false, pluginsLength=null), persistence true, outputs like screenshot_path. Builds production-ready pipelines evading detection while extracting parseable data.