News Feed Forums General Web Scraping How can I scrape structured data from sites without standard HTML tags?

  • How can I scrape structured data from sites without standard HTML tags?

    Posted by Achim Antioco on 11/16/2024 at 6:34 am

    CSS and XPath selectors allow for flexible selection even if standard tags aren’t used. I customize selectors for each unique element.

    Nohemi Preben replied 3 days, 7 hours ago 8 Members · 7 Replies
  • 7 Replies
  • Gojko Diomedes

    Member
    11/19/2024 at 5:40 am

    Scrapy’s XPath expressions are especially helpful for locating non-standard elements by their position in the DOM structure.

  • Headley Corrie

    Member
    11/19/2024 at 5:53 am

    Regular expressions can sometimes capture patterns within unconventional tags, though it’s less reliable for deeply nested data.

  • Elea Aelita

    Member
    11/19/2024 at 6:02 am

    If data is loaded via JavaScript, I look for embedded JSON or XML, which often contains structured data more easily parsed.

  • Herakles Urias

    Member
    11/19/2024 at 6:21 am

    Inspecting CSS classes and id attributes for unique identifiers helps locate data within unusual or proprietary tags.

  • Manoj Fikreta

    Member
    11/19/2024 at 6:32 am

    For sites with consistent visual patterns, using pixel-based location recognition with headless browsers helps capture data visually.

  • Zusman Mimmi

    Member
    11/19/2024 at 7:36 am

    JSON extraction tools can capture data embedded within scripts, which is common on pages that rely on JavaScript for layout.

  • Nohemi Preben

    Member
    11/19/2024 at 7:49 am

    Parsing out meta tags or schema markup is also effective, as some sites embed structured data in the header instead of HTML tables.

Log in to reply.