How can I scrape structured data from sites without standard HTML tags? - Rayobyte Community

General Web Scraping

How can I scrape structured data from sites without standard HTML tags?

Posted by Achim Antioco on 11/16/2024 at 6:34 am

CSS and XPath selectors allow for flexible selection even if standard tags aren’t used. I customize selectors for each unique element.

Nohemi Preben replied 5 months ago 8 Members · 7 Replies
7 Replies

Gojko Diomedes

Member
11/19/2024 at 5:40 am

Scrapy’s XPath expressions are especially helpful for locating non-standard elements by their position in the DOM structure.
Headley Corrie

Member
11/19/2024 at 5:53 am

Regular expressions can sometimes capture patterns within unconventional tags, though it’s less reliable for deeply nested data.
Elea Aelita

Member
11/19/2024 at 6:02 am

If data is loaded via JavaScript, I look for embedded JSON or XML, which often contains structured data more easily parsed.
Herakles Urias

Member
11/19/2024 at 6:21 am

Inspecting CSS classes and id attributes for unique identifiers helps locate data within unusual or proprietary tags.
Manoj Fikreta

Member
11/19/2024 at 6:32 am

For sites with consistent visual patterns, using pixel-based location recognition with headless browsers helps capture data visually.
Zusman Mimmi

Member
11/19/2024 at 7:36 am

JSON extraction tools can capture data embedded within scripts, which is common on pages that rely on JavaScript for layout.
Nohemi Preben

Member
11/19/2024 at 7:49 am

Parsing out meta tags or schema markup is also effective, as some sites embed structured data in the header instead of HTML tables.

Log In to Reply

Log in to reply.