{"id":1112,"date":"2024-10-06T10:59:24","date_gmt":"2024-10-06T10:59:24","guid":{"rendered":"https:\/\/rayobyte.com\/community\/?post_type=scraping_project&#038;p=1112"},"modified":"2024-10-08T18:23:16","modified_gmt":"2024-10-08T18:23:16","slug":"scraping-wikipedia-with-python-extract-articles-and-metadata","status":"publish","type":"scraping_project","link":"https:\/\/rayobyte.com\/community\/scraping-project\/scraping-wikipedia-with-python-extract-articles-and-metadata\/","title":{"rendered":"Scraping Wikipedia with Python: Extract Articles and Metadata"},"content":{"rendered":"<p style=\"text-align: center;\"><iframe loading=\"lazy\" title=\"YouTube video player\" src=\"https:\/\/www.youtube.com\/embed\/0zPLvMwNmMc?si=JSj_KleS7CDIkFG1\" width=\"560\" height=\"315\" frameborder=\"0\" allowfullscreen=\"allowfullscreen\"><\/iframe><\/p>\n<p><a href=\"https:\/\/github.com\/MDFARHYN\/wikipedia_scraping\" rel=\"nofollow noopener\" target=\"_blank\">Download all source code from GitHub<\/a><\/p>\n<h1>Table of content<\/h1>\n<ul>\n<li><a href=\"#introduction\">Introduction<\/a><\/li>\n<li><a href=\"#installing-tools\">Installing the Tools You Need<\/a><\/li>\n<li><a href=\"#verify-installations\">Verify Installations<\/a><\/li>\n<li><a href=\"#scrape-and-clean\">How to Scrape and Clean Data<\/a><\/li>\n<li><a href=\"#regex\">Scraping Text with Regular Expressions<\/a><\/li>\n<li><a href=\"#scrape-infobox\">How to Scrape the Wikipedia Infobox<\/a><\/li>\n<li><a href=\"#scrape-tables-pandas\">How to Scrape Wikipedia Tables Using Pandas<\/a><\/li>\n<li><a href=\"#save-csv\">Save Data to CSV<\/a><\/li>\n<li><a href=\"#visualize-data\">Visualize the Data<\/a><\/li>\n<li><a href=\"#custom-database\">Build a Custom Database to Store Wikipedia Data<\/a><\/li>\n<li><a href=\"#error-handling\">Error Handling and Debugging During Scraping<\/a><\/li>\n<li><a href=\"#proxies\">Why Do We Need To Use Proxies in Scraping?<\/a><\/li>\n<li><a href=\"#ethical-scraping\">Ethical Scraping and Legal Considerations<\/a><\/li>\n<\/ul>\n<div id=\"introduction\"><\/div>\n<div><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-1123 size-full\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/final_thumb.png\" alt=\"\" width=\"1024\" height=\"1024\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/final_thumb.png 1024w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/final_thumb-300x300.png 300w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/final_thumb-150x150.png 150w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/final_thumb-768x768.png 768w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/final_thumb-624x624.png 624w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/div>\n<h1><b>Introduction<\/b><\/h1>\n<p>Have you ever spent hours on Wikipedia, hopping from one page to another, only to realize how much interesting information you&#8217;ve come across? Now imagine if all that data could be automatically collected\u2014that&#8217;s where web scraping comes in!<\/p>\n<p>In this project, I\u2019ll walk you through how to scrape Wikipedia. You&#8217;ll learn how to extract data from infoboxes (those side boxes on Wikipedia), tables (which can be tricky), and plain text. I\u2019ll also cover how to clean and store that data in useful formats like CSV files, and even show you how to set up a small database.<\/p>\n<p>As a bonus, I\u2019ll provide tips on handling errors and scraping ethically to ensure you&#8217;re doing it the right way. Whether you&#8217;re curious or need data for a project, I\u2019ll guide you through each step in a simple, easy-to-understand way.<\/p>\n<div id=\"installing-tools\">\n<h1><b>Installing the Tools You Need<\/b><\/h1>\n<p><span style=\"font-weight: 400;\">First things first, you need the right tools to scrape data from Wikipedia These tools makes data scraping, cleaning and visualization very easy. Just follow these steps to get them installed.<\/span><\/p>\n<p><b>Pandas: <\/b><span style=\"font-weight: 400;\">\u00a0Data cleaning,organization and data analysis<\/span><\/p>\n<p><b>Matplotlib:<\/b><span style=\"font-weight: 400;\"> This is used to create plots and graphs (data visualization).<\/span><\/p>\n<p><b>BeautifulSoup (bs4):<\/b><span style=\"font-weight: 400;\">\u00a0 For scraping and parsing HTML content.<\/span><\/p>\n<p><b>Requests: <\/b><span style=\"font-weight: 400;\">To get an HTTP request and to fetch a web page.<\/span><\/p>\n<p><b>\u00a0<\/b><b>Let me illustrate each step to install them.<\/b><\/p>\n<p><b>Step 1: Set Up Python<\/b><\/p>\n<p><span style=\"font-weight: 400;\">Before anything else, ensure that you have Python installed in your computer. If not you can download it from python official website <\/span><a href=\"https:\/\/www.python.org\" rel=\"nofollow noopener\" target=\"_blank\"><span style=\"font-weight: 400;\">https:\/\/www.python.org<\/span><\/a><span style=\"font-weight: 400;\">. Don&#8217;t forget to check &#8220;Add Python\u00a0 to PATH&#8221; before installing. It simplifies working from the command line with Python.<\/span><\/p>\n<p><b>Step 2: pandas and matplotlib Install<\/b><\/p>\n<p><span style=\"font-weight: 400;\">Pandas is going to help you structure data in, what basically will look like a table and Matplotlib is going to be very useful in converting your data into charts or graphs.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">To install both copy and paste in your terminal\/ command prompt.<\/span><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">pip install pandas matplotlib<\/pre>\n<p><span style=\"font-weight: 400;\">Press Enter and both packages will be installed<\/span><\/p>\n<p><b>Step 3: BeautifulSoup &amp; Requests library<\/b><\/p>\n<p><span style=\"font-weight: 400;\">Requests a library that you use to download the pages and then Beautiful Soup, which will help us parse the page so we can get what we want.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Run the following command to install them.<\/span><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">pip install beautifulsoup4 requests<\/pre>\n<p><span style=\"font-weight: 400;\">This will fetch the required browsers (like Chromium or Firefox) used by Playwright for page scraping.<\/span><\/p>\n<div id=\"verify-installations\"><\/div>\n<h1><b>Verify Installations<\/b><\/h1>\n<p><span style=\"font-weight: 400;\">After it is installed you can check if everything works by checking the version of each package by running the following commands:<\/span><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">python -c \"import pandas; print(pandas.__version__)\"\r\npython -c \"import matplotlib; print(matplotlib.__version__)\"\r\npython -c \"import bs4; print(bs4.__version__)\"\r\npython -c \"import requests; print(requests.__version__)\"\r\n<\/pre>\n<p><span style=\"font-weight: 400;\">If you see version numbers like below screenshot for all these then you are good.<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-1113 size-full\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/wiki_check_package_version.png\" alt=\"wiki_check_package_version\" width=\"1919\" height=\"1031\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/wiki_check_package_version.png 1919w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/wiki_check_package_version-300x161.png 300w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/wiki_check_package_version-1024x550.png 1024w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/wiki_check_package_version-768x413.png 768w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/wiki_check_package_version-1536x825.png 1536w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/wiki_check_package_version-624x335.png 624w\" sizes=\"auto, (max-width: 1919px) 100vw, 1919px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">You have all the tools installed, now it is time to scrape Wikipedia. Let\u2019s get started!<\/span><\/p>\n<div id=\"scrape-and-clean\"><\/div>\n<h1><b>How to Scrape and Clean Data<\/b><\/h1>\n<p><span style=\"font-weight: 400;\">When it comes to web scraping wikipedia is no different, you need to fetch the HTML content and clean it in order to get some ( usable ) data. Wikipedia pages contain many nonessential elements (e.g. HTML tags, references, special characters), therefore data cleaning is required to retain only the useful info.<\/span><\/p>\n<p><b>Step1: Importing required Libraries<\/b><\/p>\n<p><span style=\"font-weight: 400;\">First, we will import requests and BeautifulSoup\u00a0<\/span><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">import requests\r\nfrom bs4 import BeautifulSoup<\/pre>\n<p><b>Step 2: Opening the Wikipedia page using a GET Request<\/b><\/p>\n<p><span style=\"font-weight: 400;\">We request the Wikipedia page, in this example &#8220;Python (programming language)&#8221; using an HTTP Request.<\/span><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">url = 'https:\/\/en.wikipedia.org\/wiki\/Python_(programming_language)'\r\n\r\nstart_time = time.time()\r\nresponse = requests.get(url)\r\n\r\n# Check if the status code is 200, indicating a successful request\r\nif response.status_code == 200:\r\n\u00a0 \u00a0 print(f\"Page fetched in {time.time() - start_time} seconds\")\r\nelse:\r\n\u00a0 \u00a0 print(\"Unable to download the page\")<\/pre>\n<p><b>Step 3: Use BeautifulSoup to Parse contents from the HTML Page<\/b><\/p>\n<p><span style=\"font-weight: 400;\">When the page is fetched, it separates HTML content and formatted to a structured format with the help of BeautifulSoup.<\/span><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">soup = BeautifulSoup(response.text, 'html. parser')<\/pre>\n<p><span style=\"font-weight: 400;\">Soup is now having the whole HTML structure of the page which we could further parse.<\/span><\/p>\n<p><b>Step 4: Data Extraction\u00a0<\/b><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0Here, we use `soup.find` and `soup.select` for target title and paragraph from html.\u00a0<\/span><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\"># Extract the title and\u00a0 paragraph\r\nh1_title = soup.find('h1')\r\nprint(\"h1---&gt;\", h1_title)\r\n\r\npragraph = soup.select('p:nth-of-type(3)')[0]\r\nprint(\"pragraph---&gt;\", pragraph)<\/pre>\n<p><b>Step 5: Data Cleaning<\/b><\/p>\n<p><span style=\"font-weight: 400;\">Wikipedia may be filled with extraneous characters, such as citation numbers ([1], [2]). These are the parts that we must minimise.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Get text, strip newline and extra spaces<\/span><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">pragraph = soup.select('p:nth-of-type(3)')[0].get_text().strip().replace('n', ' ')\r\n\r\nprint(\"pragraph---&gt;\", pragraph)<\/pre>\n<p><span style=\"font-weight: 400;\">`strip()` Remove leading and trailing spaces, replace(&#8216;n&#8217;, &#8216; &#8216;)\u00a0 This replaces newlines with space.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Remove citation references: [1], [2]<\/span><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">reg_ppattern = r'[d+]'\r\ncleaned_pragraph= re.sub(reg_ppattern,'',pragraph)\r\nprint(\"cleaned_pragraph\",cleaned_pragraph)<\/pre>\n<p><span style=\"font-weight: 400;\">\u00a0<\/span><span style=\"font-weight: 400;\">Example Output<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Original Text:<\/span><\/p>\n<p>&#8220;&#8221;&#8221;Python is dynamically typed and garbage-collected. It supports multiple programming paradigms, including structured (particularly procedural), object-oriented and functional programming. It is often described as a &#8220;batteries included&#8221; language<br \/>\ndue to its comprehensive standard library.[33][34]&#8221;&#8221;&#8221;<\/p>\n<p><span style=\"font-weight: 400;\">Cleaned Text:<\/span><\/p>\n<p>&#8220;&#8221;&#8221;Python is dynamically typed and garbage-collected. It supports multiple programming paradigms, including structured (particularly procedural),<br \/>\nobject-oriented and functional programming. It is often described as a &#8220;batteries included&#8221;<br \/>\nlanguage due to its comprehensive standard library.&#8221;&#8221;&#8221;<\/p>\n<p><span style=\"font-weight: 400;\">Full code:<\/span><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">import requests\r\nfrom bs4 import BeautifulSoup\r\nimport time\r\nimport re\r\n\r\n# The URL of the Wikipedia page you want to scrape\r\nurl = 'https:\/\/en.wikipedia.org\/wiki\/Python_(programming_language)'\r\n\r\n# Start the timer to measure how long the request takes\r\nstart_time = time.time()\r\n\r\n# Send an HTTP GET request to fetch the page content\r\nresponse = requests.get(url)\r\n\r\n# Check if the request was successful (status code 200 means success)\r\nif response.status_code == 200:\r\n\u00a0 \u00a0 print(f\"Page fetched in {time.time() - start_time} seconds\")\u00a0 # Print the time taken to fetch the page\r\nelse:\r\n\u00a0 \u00a0 print(\"Unable to download the page\")\u00a0 # Print an error message if the request fails\r\n\r\n# Parse the page content using BeautifulSoup and 'html.parser' to process the HTML\r\nsoup = BeautifulSoup(response.text, 'html.parser')\r\n\r\n# Extract the first &lt;h1&gt; tag (usually the title of the Wikipedia article)\r\nh1_title = soup.find('h1').get_text().strip()\u00a0 # .strip() removes any extra spaces or newlines\r\nprint(\"h1---&gt;\", h1_title)\u00a0 # Print the extracted title\r\n\r\n# Extract the 3rd &lt;p&gt; (paragraph) tag from the page\r\n# We use nth-of-type(3) to select the 3rd paragraph, and get its text content, then clean up newlines and extra spaces\r\npragraph = soup.select('p:nth-of-type(3)')[0].get_text().strip().replace('n', ' ')\r\nprint(\"pragraph---&gt;\", pragraph)\u00a0 # Print the raw paragraph text\r\n\r\n# Define a regular expression pattern to find and remove references like [1], [2] from the text\r\nreg_ppattern = r'[d+]'\r\n\r\n# Use re.sub() to substitute and remove any matches of the regex pattern (i.e., the reference numbers)\r\ncleaned_pragraph = re.sub(reg_ppattern, '', pragraph)\r\n\r\n# Print the cleaned paragraph, which now has no reference numbers\r\nprint(\"cleaned_pragraph---&gt;\", cleaned_pragraph)<\/pre>\n<div id=\"regex\"><\/div>\n<h1><b>Scraping Text with Regular Expressions<\/b><\/h1>\n<p><span style=\"font-weight: 400;\">Regex (regular expressions) are extremely powerful for finding patterns in text, which is perfect for scraping structured data out of Wikipedia. Already mentioned above, if you would like to cleanup some unwanted parts (e.g.: citations&#8217;s [1], [2] etc) from your string; regex can help tremendously.<\/span><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">import re\r\n\r\n# Example of text with citations\r\ntext = \"Python is a high-level programming language.[1] It was created in 1991.[2]\"\r\n\r\n# Regular expression to remove citations like [1], [2]\r\ncleaned_text = re.sub(r'[d+]', '', text)\r\n\r\nprint(cleaned_text)<\/pre>\n<p><span style=\"font-weight: 400;\">In this example:<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The regular expression <code>r'[d+]'<\/code> will match any citation number ( i.e., [1], [2], etc.) <code>re.sub()<\/code><\/span>\u00a0 which looks for those matches and replaces them with an empty string, thus deleting them.<\/p>\n<p><span style=\"font-weight: 400;\">\u00a0<\/span><span style=\"font-weight: 400;\">For more complex data extraction, e.g. extracting tables, infoboxes or other sections of a page, this method can be further elaborated.<\/span><\/p>\n<div id=\"scrape-infobox\"><\/div>\n<h1><b>How to Scrape the Wikipedia Infobox<\/b><\/h1>\n<p><span style=\"font-weight: 400;\">The infobox on a Wikipedia page is that box on the right side with key facts, like important dates, names, and other structured information. It&#8217;s super useful when you want to grab summarized data quickly. Scraping the infobox is pretty simple because it follows a consistent structure across Wikipedia pages.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Here\u2019s how you can scrape the infobox using Python and <\/span><span style=\"font-weight: 400;\">BeautifulSoup<\/span><span style=\"font-weight: 400;\">.<\/span><\/p>\n<p><b>Step-by-Step Code for Scraping a Wikipedia Infobox<\/b><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">import requests\r\nfrom bs4 import BeautifulSoup\r\n\r\n# Wikipedia page URL\r\nurl = 'https:\/\/en.wikipedia.org\/wiki\/Python_(programming_language)'\r\n\r\n# Fetch the page\r\nresponse = requests.get(url)\r\n\r\n# Check if the request was successful\r\nif response.status_code == 200:\r\n    # Parse the HTML content\r\n    soup = BeautifulSoup(response.text, 'html.parser')\r\n    \r\n    # Find the infobox table (usually has class \"infobox\")\r\n    infobox = soup.find('table', {'class': 'infobox'})\r\n\r\n    # Find all rows within the infobox\r\n    rows = infobox.find_all('tr')\r\n\r\n    # Loop through rows and extract header (th) and data (td)\r\n    for row in rows:\r\n        header = row.find('th')  # Header cell (like \"Developer\")\r\n        data = row.find('td')    # Data cell (like \"Python Software Foundation\")\r\n\r\n        if header and data:\r\n            print(f\"{header.get_text(strip=True)}: {data.get_text(strip=True)}\")\r\nelse:\r\n    print(\"Failed to fetch the page\")\r\n\r\n<\/pre>\n<h3><b>Explanation:<\/b><\/h3>\n<ol>\n<li style=\"font-weight: 400;\"><b>Request the Page<\/b><span style=\"font-weight: 400;\">: We use <\/span><span style=\"font-weight: 400;\">requests.get()<\/span><span style=\"font-weight: 400;\"> to fetch the page content from the URL.<\/span><\/li>\n<li style=\"font-weight: 400;\"><b>Parse HTML with BeautifulSoup<\/b><span style=\"font-weight: 400;\">: Once we get the page, <\/span><span style=\"font-weight: 400;\">BeautifulSoup<\/span><span style=\"font-weight: 400;\"> helps turn that HTML into something we can easily navigate.<\/span><\/li>\n<li style=\"font-weight: 400;\"><b>Find the Infobox<\/b><span style=\"font-weight: 400;\">: We search for the infobox, which is usually inside a <\/span><span style=\"font-weight: 400;\">&lt;table&gt;<\/span><span style=\"font-weight: 400;\"> with the class <\/span><span style=\"font-weight: 400;\">&#8220;infobox&#8221;<\/span><span style=\"font-weight: 400;\">.<\/span><\/li>\n<li style=\"font-weight: 400;\"><b>Extract Data<\/b><span style=\"font-weight: 400;\">: We loop through each row (<\/span><span style=\"font-weight: 400;\">&lt;tr&gt;<\/span><span style=\"font-weight: 400;\">) in the infobox, and for each row, we extract the header (usually in a <\/span><span style=\"font-weight: 400;\">&lt;th&gt;<\/span><span style=\"font-weight: 400;\"> tag) and the data (in a <\/span><span style=\"font-weight: 400;\">&lt;td&gt;<\/span><span style=\"font-weight: 400;\"> tag).<\/span><\/li>\n<\/ol>\n<h3><b>Example Output:<\/b><\/h3>\n<p><span style=\"font-weight: 400;\">When you run this code, you&#8217;ll get something like this:<\/span><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">Developer: Python Software Foundation\r\nFirst appeared: 20 February 1991; 33 years ago(1991-02-20)[2]\r\nStable release: 3.12.7\/ 1 October 2024; 4 days ago(1 October 2024)<\/pre>\n<p><span style=\"font-weight: 400;\">This way, you get all the key details from the Wikipedia infobox in a neat and structured format! Simple and effective.<\/span><\/p>\n<div id=\"scrape-tables-pandas\"><\/div>\n<h1><b>How to Scrape Wikipedia Tables Using Pandas<\/b><\/h1>\n<p><span style=\"font-weight: 400;\">Scraping tables from Wikipedia is super easy with <\/span><b>pandas<\/b><span style=\"font-weight: 400;\">, which has a built-in method for extracting HTML tables directly. No need to dig into the HTML structure manually \u2014 you can grab tables with just one line of code. This makes it perfect for quickly pulling structured data, like tables of countries, statistics, or rankings.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Here\u2019s how to do it using <\/span><span style=\"font-weight: 400;\">pandas.read_html()<\/span><span style=\"font-weight: 400;\">.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0<\/span><b>Code for Scraping Wikipedia Tables with Pandas<\/b><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">mport pandas as pd\r\n\r\n# Wikipedia page URL\r\nurl = 'https:\/\/en.wikipedia.org\/wiki\/List_of_countries_and_dependencies_by_population'\r\n\r\n# Use pandas' read_html to scrape all tables from the page\r\ntables = pd.read_html(url)\r\n\r\n# Check how many tables were found\r\nprint(f\"Total tables found: {len(tables)}\")\r\n\r\n# Display the first table (index 0)\r\ndf = tables[0]\r\nprint(df.head())\r\n\r\n# Save the DataFrame to a CSV file\r\ndf.to_csv('wikipedia_data.csv', index=False)<\/pre>\n<h3><b>Explanation:<\/b><\/h3>\n<ol>\n<li style=\"font-weight: 400;\"><b>read_html()<\/b><span style=\"font-weight: 400;\">: Pandas `read_html()` is an inbuilt function that automatically parses the given URL and extracts all tables from it. You can choose a specific list element to view a certain table (e.g., tables[0] for first table) and convert it into a DataFrame to make analysis easier.<\/span><\/li>\n<li style=\"font-weight: 400;\"><b>Extracting Tables<\/b><span style=\"font-weight: 400;\">: In the example above, `<\/span><span style=\"font-weight: 400;\">tables = pd.read_html(url)`<\/span><span style=\"font-weight: 400;\"> pulls all the tables from the Wikipedia page and stores them in a list.<\/span><\/li>\n<li style=\"font-weight: 400;\"><b>Viewing the Table<\/b><span style=\"font-weight: 400;\">: You can select and display a specific table by indexing into the list (e.g., <\/span><span style=\"font-weight: 400;\">tables[0]<\/span><span style=\"font-weight: 400;\"> for the first table) and turning it into a DataFrame for easy analysis.<\/span><\/li>\n<\/ol>\n<h3><b>Example Output:<\/b><\/h3>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">\u00a0 \u00a0 Rank Country\/Dependency\u00a0 Population\r\n0 \u00a0 \u00a0 1\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 China\u00a0 1,412,600,000\r\n1 \u00a0 \u00a0 2\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 India\u00a0 1,366,000,000\r\n2 \u00a0 \u00a0 3\u00a0 \u00a0 United States \u00a0 331,883,986<\/pre>\n<h3><b>Why Use Pandas for Scraping Tables?<\/b><\/h3>\n<ul>\n<li style=\"font-weight: 400;\"><b>Simplicity<\/b><span style=\"font-weight: 400;\">: With `<\/span><span style=\"font-weight: 400;\">pandas.read_html()`<\/span><span style=\"font-weight: 400;\">, you don\u2019t need to worry about the HTML structure at all. It automatically parses the tables for you.<\/span><\/li>\n<li style=\"font-weight: 400;\"><b>Multiple Tables<\/b><span style=\"font-weight: 400;\">: It can handle multiple tables on the page and store them in a list of DataFrames, which is ideal when you need to scrape multiple sets of data at once.<\/span><\/li>\n<li style=\"font-weight: 400;\"><b>Easy Export<\/b><span style=\"font-weight: 400;\">: You can easily export the tables to a CSV or Excel file for further analysis with pandas.<\/span><\/li>\n<\/ul>\n<div id=\"save-csv\"><\/div>\n<h1><b>Save Data to CSV<\/b><\/h1>\n<p><span style=\"font-weight: 400;\">After you finish scraping the data, a lot of times it can be useful to save it for later use or processing. The most simple example is to save the data just into a CSV file, which is a vertically aligned table and that\u00b4s why it is used in many applications. Here is a quick way to save the scraped data into a csv file using pandas.<\/span><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">df.to_csv('wikipedia_data.csv', index=False)<\/pre>\n<p><span style=\"font-weight: 400;\">\u00a0<\/span><span style=\"font-weight: 400;\">Explanation:<\/span><\/p>\n<p><b>to_csv():<\/b><span style=\"font-weight: 400;\"> This function saves your DataFrame (df) in a file called wikipedia_data.csv.<\/span><\/p>\n<p><b>index=False:<\/b><span style=\"font-weight: 400;\"> This prevents DataFrame index to be saved in the CSV file as an additional column.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Here I attached screenshot how the csv result will be look like<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-1115 size-full\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/wiki_csv.png\" alt=\"\" width=\"1918\" height=\"1016\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/wiki_csv.png 1918w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/wiki_csv-300x159.png 300w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/wiki_csv-1024x542.png 1024w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/wiki_csv-768x407.png 768w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/wiki_csv-1536x814.png 1536w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/wiki_csv-624x331.png 624w\" sizes=\"auto, (max-width: 1918px) 100vw, 1918px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0<\/span><\/p>\n<div id=\"visualize-data\"><\/div>\n<h1><b>Visualize the Data<\/b><\/h1>\n<p><span style=\"font-weight: 400;\">When you have an organized data after scraping, it will be easy for you to understand patterns and insights by visualizing the same. For our example, we will visualize a top 10 table of countries and their populations that we scraped from Wikipedia using an easy bar chart.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0Code for the Visualization part Step by step.<\/span><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">import pandas as pd\r\nimport matplotlib.pyplot as plt\r\n\r\n# Wikipedia page URL\r\nurl = 'https:\/\/en.wikipedia.org\/wiki\/List_of_countries_and_dependencies_by_population'\r\n\r\n# Scrape the table using pandas\r\ntables = pd.read_html(url)\r\n\r\n# Extract the first table (usually the most relevant one)\r\ndf = tables[0]\r\n\r\n# Clean the data: Remove any rows with missing 'Population' data\r\ndf = df.dropna(subset=['Population'])\r\n\r\n# Ensure 'Population' column is treated as string, then remove commas and convert to integers\r\ndf['Population'] = df['Population'].astype(str).str.replace(',', '').str.extract('(d+)').astype(int)\r\n\r\n# Select the top 10 countries by population\r\ntop_10 = df[['Location', 'Population']].head(10)\u00a0 # Use 'Location' as the country name\r\n\r\n# Plot a bar chart\r\nplt.figure(figsize=(10, 6))\r\nplt.bar(top_10['Location'], top_10['Population'], color='skyblue')\r\n\r\n# Add labels and title\r\nplt.xlabel('Country')\r\nplt.ylabel('Population')\r\nplt.title('Top 10 Most Populated Countries')\r\nplt.xticks(rotation=45)\u00a0 # Rotate country names for better readability\r\nplt.tight_layout()\u00a0 # Adjust layout to prevent label cutoff\r\n\r\n# Show the plot\r\nplt.show()<\/pre>\n<h3><b>Explanation:<\/b><\/h3>\n<ol>\n<li style=\"font-weight: 400;\"><b>Scrape the Table<\/b><span style=\"font-weight: 400;\">: We use `<\/span><span style=\"font-weight: 400;\">pandas.read_html()`<\/span><span style=\"font-weight: 400;\">\u00a0to scrape the table from Wikipedia, which returns a list of DataFrames. We work with the first table (`<\/span><span style=\"font-weight: 400;\">tables[0]`<\/span><span style=\"font-weight: 400;\">).<\/span><\/li>\n<li style=\"font-weight: 400;\"><b>Clean the Data<\/b><span style=\"font-weight: 400;\">: We clean the &#8216;Population&#8217; column by removing commas and converting the population figures from strings to integers. We also remove any rows with missing population data.<\/span><\/li>\n<li style=\"font-weight: 400;\"><b>Select Top 10 Countries<\/b><span style=\"font-weight: 400;\">: Using `<\/span><span style=\"font-weight: 400;\">df.head(10)`<\/span><span style=\"font-weight: 400;\">, we select the first 10 rows, which represent the top 10 countries by population.<\/span><\/li>\n<li style=\"font-weight: 400;\"><b>Create a Bar Chart<\/b><span style=\"font-weight: 400;\">:<\/span>\n<ul>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">We use `<\/span><span style=\"font-weight: 400;\">plt.bar()`<\/span><span style=\"font-weight: 400;\"> to create a bar chart of the top 10 most populated countries.<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">The `<\/span><span style=\"font-weight: 400;\">xlabel`<\/span><span style=\"font-weight: 400;\"> and `<\/span><span style=\"font-weight: 400;\">ylabel`<\/span><span style=\"font-weight: 400;\"> functions add labels to the axes, and <\/span><span style=\"font-weight: 400;\">title<\/span><span style=\"font-weight: 400;\"> sets the chart title.<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">`xticks(rotation=45)`<\/span><span style=\"font-weight: 400;\"> rotates the country names on the x-axis for better readability.<\/span><\/li>\n<\/ul>\n<\/li>\n<li style=\"font-weight: 400;\"><b>Show the Plot<\/b><span style=\"font-weight: 400;\">: Finally,` <\/span><span style=\"font-weight: 400;\">plt.show()`<\/span><span style=\"font-weight: 400;\"> displays the bar chart.<\/span><\/li>\n<\/ol>\n<p><span style=\"font-weight: 400;\">Here is a screenshot of how the Matplotlib visual result will look:<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-1117 size-full\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/wiki_visual.png\" alt=\"wiki_visual\" width=\"1903\" height=\"750\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/wiki_visual.png 1903w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/wiki_visual-300x118.png 300w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/wiki_visual-1024x404.png 1024w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/wiki_visual-768x303.png 768w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/wiki_visual-1536x605.png 1536w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/wiki_visual-624x246.png 624w\" sizes=\"auto, (max-width: 1903px) 100vw, 1903px\" \/><\/p>\n<h1 id=\"custom-database\"><\/h1>\n<h1><b>Build a Custom Database to Store Wikipedia Data<\/b><\/h1>\n<p><span style=\"font-weight: 400;\">It is a good approach to store the data into your custom database. It enables you to work efficiently with a large set of data, and do heavy lifting queries. Let&#8217;s learn how you can store your scraped wikipedia data in an SQLite database\u00a0<\/span><\/p>\n<h3><b>Step-by-Step Code to Build and Store Data in SQLite<\/b><\/h3>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">import pandas as pd\r\nimport sqlite3\r\n\r\n# Wikipedia page URL\r\nurl = 'https:\/\/en.wikipedia.org\/wiki\/List_of_countries_and_dependencies_by_population'\r\n\r\n# Scrape the table using pandas\r\ntables = pd.read_html(url)\r\n\r\n# Extract the first table (usually the most relevant one)\r\ndf = tables[0]\r\n\r\n# Clean the data: Remove any rows with missing 'Population' data\r\ndf = df.dropna(subset=['Population'])\r\n\r\n# Ensure 'Population' column is treated as string, then remove commas and convert to 64-bit integers\r\ndf['Population'] = df['Population'].astype(str).str.replace(',', '').str.extract('(d+)').astype('int64')\r\n\r\n# Select relevant columns\r\ndf = df[['Location', 'Population']]\r\n\r\n# Connect to SQLite (or create the database if it doesn't exist)\r\nconn = sqlite3.connect('wikipedia_data.db')\r\n\r\n# Store the data in a new table called 'countries_population'\r\ndf.to_sql('countries_population', conn, if_exists='replace', index=False)\r\n\r\n# Confirm the data is stored by querying the database\r\nresult_df = pd.read_sql('SELECT * FROM countries_population', conn)\r\nprint(result_df.head())\r\n\r\n# Close the connection\r\nconn.close()<\/pre>\n<p><b>Explanation:<\/b><\/p>\n<ol>\n<li style=\"font-weight: 400;\"><b>Scrape the Data<\/b><span style=\"font-weight: 400;\">: As in our previous examples, we scrape a table from a Wikipedia page and clean the data by removing missing values and converting the population column into integers.<\/span><\/li>\n<li style=\"font-weight: 400;\"><b>Connect to SQLite<\/b><span style=\"font-weight: 400;\">: We use `<\/span><span style=\"font-weight: 400;\">sqlite3.connect()`<\/span><span style=\"font-weight: 400;\"> to create a connection to an SQLite database. If the database doesn\u2019t exist, SQLite will create it for you. In this example, the database is named <\/span><span style=\"font-weight: 400;\">`wikipedia_data.db`<\/span><span style=\"font-weight: 400;\">.<\/span><\/li>\n<li style=\"font-weight: 400;\"><b>Store Data in the Database<\/b><span style=\"font-weight: 400;\">:<\/span>\n<ul>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">`df.to_sql()`<\/span><span style=\"font-weight: 400;\"> saves the DataFrame as a new table in the SQLite database. The table is named <\/span><span style=\"font-weight: 400;\">`countries_population`<\/span><span style=\"font-weight: 400;\">, and the `<\/span><span style=\"font-weight: 400;\">if_exists=&#8217;replace&#8217;`<\/span><span style=\"font-weight: 400;\"> option ensures that any existing table with the same name will be replaced.<\/span><\/li>\n<\/ul>\n<\/li>\n<li style=\"font-weight: 400;\"><b>Query the Database<\/b><span style=\"font-weight: 400;\">: To confirm the data was successfully stored, we use `<\/span><span style=\"font-weight: 400;\">pd.read_sql()`<\/span><span style=\"font-weight: 400;\"> to run a SQL query `(<\/span><span style=\"font-weight: 400;\">SELECT * FROM countries_population<\/span><span style=\"font-weight: 400;\">)` that retrieves all the data from the table. We then print out the first few rows to verify the data.<\/span><\/li>\n<li style=\"font-weight: 400;\"><b>Close the Connection<\/b><span style=\"font-weight: 400;\">: After we\u2019re done, we close the database connection using `<\/span><span style=\"font-weight: 400;\">conn.close()`<\/span><\/li>\n<\/ol>\n<p><span style=\"font-weight: 400;\">See the screenshot of the results from our SQLite database<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-1119 size-full\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/wiki_sql_light.png\" alt=\"wiki_sql_light\" width=\"1305\" height=\"782\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/wiki_sql_light.png 1305w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/wiki_sql_light-300x180.png 300w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/wiki_sql_light-1024x614.png 1024w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/wiki_sql_light-768x460.png 768w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/wiki_sql_light-624x374.png 624w\" sizes=\"auto, (max-width: 1305px) 100vw, 1305px\" \/><\/p>\n<div id=\"error-handling\"><\/div>\n<h1><b>Error Handling and Debugging During Scraping<\/b><\/h1>\n<p><span style=\"font-weight: 400;\">While scraping websites like Wikipedia errors can occur, they may be caused by connection problems, the lack of some information or entries in improper formats. Gaining complete control over your script is important for debugging and error handling which in turn provides better reliability. One simple way to achieve this task would be to use the logging module of python which provides you with a pretty handy way to see what is happening inside your script without breaking its execution.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Here\u2019s how to handle common errors during scraping:<\/span><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">import requests\r\nimport logging\r\nfrom bs4 import BeautifulSoup\r\n\r\n# Set up logging to log to a file\r\nlogging.basicConfig(filename='scraping.log',\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 level=logging.INFO,\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 format='%(asctime)s - %(levelname)s - %(message)s')\r\n\r\nurl = 'https:\/\/en.wikipedia.org\/wiki\/Python_(programming_language)'\r\n\r\ntry:\r\n\u00a0 \u00a0 # Fetch the webpage\r\n\u00a0 \u00a0 response = requests.get(url)\r\n\u00a0 \u00a0 response.raise_for_status()\u00a0 # Raise an error for bad requests\r\n\u00a0 \u00a0 logging.info(f\"Successfully fetched the page: {url}\")\r\nexcept requests.exceptions.RequestException as e:\r\n\u00a0 \u00a0 logging.error(f\"Error fetching the page: {url} | {e}\")\r\n\u00a0 \u00a0 exit()\r\n\r\n# Parse the page content\r\nsoup = BeautifulSoup(response.text, 'html.parser')\r\n\r\n# Safely find an element and log if missing\r\ninfobox = soup.find('table', {'class': 'infobox'})\r\nif infobox:\r\n\u00a0 \u00a0 logging.info(\"Infobox found!\")\r\nelse:\r\n\u00a0 \u00a0 logging.warning(\"Infobox not found!\")<\/pre>\n<h3><b>Explanation:<\/b><\/h3>\n<ul>\n<li style=\"font-weight: 400;\"><b>Logging setup<\/b><span style=\"font-weight: 400;\">: Logs messages to a file `(<\/span><span style=\"font-weight: 400;\">scraping.log<\/span><span style=\"font-weight: 400;\">)` with timestamps.<\/span><\/li>\n<li style=\"font-weight: 400;\"><b>Error Handling<\/b><span style=\"font-weight: 400;\">: Catches connection errors with `<\/span><span style=\"font-weight: 400;\">try-except`<\/span><span style=\"font-weight: 400;\"> and logs them.<\/span><\/li>\n<li style=\"font-weight: 400;\"><b>Debugging<\/b><span style=\"font-weight: 400;\">: Checks if the `<\/span><span style=\"font-weight: 400;\">infobox`<\/span><span style=\"font-weight: 400;\"> exists and logs a warning if it\u2019s missing.<\/span><\/li>\n<\/ul>\n<div id=\"proxies\"><\/div>\n<h1><b>Why Do We Need To Use Proxies In Scraping?<\/b><\/h1>\n<p><span style=\"font-weight: 400;\">Let&#8217;s say we are scraping websites such as Wikipedia, or any other large website, it&#8217;s essential to not crash the server by sending out too many requests at once. If a site notices you have been making numerous requests they may opt to block your IP.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">In this tutorial, I\u2019m using <\/span><a href=\"https:\/\/rayobyte.com\/\"><b>Rayobyte Proxy<\/b><\/a><span style=\"font-weight: 400;\"> (you can use any proxy service you prefer). Here\u2019s a simple demo of how to use proxies in your scraping code.<\/span><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">import requests\r\nimport logging\r\nfrom bs4 import BeautifulSoup\r\n\r\n# Set up logging\r\nlogging.basicConfig(filename='scraping.log', level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')\r\n\r\n# Proxy setup (replace with your proxy details)\r\nproxies = {\r\n\u00a0 \u00a0 \r\n\u00a0  \u00a0\u00a0\"https\":\u00a0\"http:\/\/PROXY_USERNAME:PROXY_PASS@PROXY_SERVER:PROXY_PORT\/\"\r\n}\r\n\r\nurl = 'https:\/\/en.wikipedia.org\/wiki\/Python_(programming_language)'\r\n\r\ntry:\r\n\u00a0 \u00a0 # Send request through a proxy\r\n\u00a0 \u00a0 response = requests.get(url, proxies=proxies)\r\n\u00a0 \u00a0 response.raise_for_status()\u00a0 # Check if the request was successful\r\n\u00a0 \u00a0 logging.info(f\"Successfully fetched the page: {url} using proxy\")\r\nexcept requests.exceptions.RequestException as e:\r\n\u00a0 \u00a0 logging.error(f\"Error fetching the page with proxy: {e}\")\r\n\u00a0 \u00a0 exit()\r\n\r\n# Parse the content\r\nsoup = BeautifulSoup(response.text, 'html.parser')\r\nprint(soup.title.string)\u00a0 # Example: print the title of the page<\/pre>\n<div id=\"ethical-scraping\">\n<h1><b>Ethical Scraping and Legal Considerations<\/b><\/h1>\n<p><span style=\"font-weight: 400;\">Scraping is useful, but it&#8217;s important to be ethical and follow rules to avoid trouble.<\/span><\/p>\n<p><b>Check `robots.txt`: <\/b><span style=\"font-weight: 400;\">See what the site allows for scraping.<\/span><\/p>\n<p><b>Don\u2019t overload servers:<\/b><span style=\"font-weight: 400;\"> Add delays to avoid overwhelming the site.<\/span><\/p>\n<p><b>Avoid sensitive data:<\/b><span style=\"font-weight: 400;\"> Only scrape public information.<\/span><\/p>\n<p><b>Follow Terms of Service: <\/b><span style=\"font-weight: 400;\">Some sites don\u2019t allow scraping.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Respect copyright: Give credit when using data.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Staying ethical and following legal guidelines ensures your scraping is responsible and avoids issues.<\/span><\/p>\n<p><a href=\"https:\/\/www.youtube.com\/watch?v=0zPLvMwNmMc\" rel=\"nofollow noopener\" target=\"_blank\">Watch the tutorial on YouTube<\/a><\/p>\n<\/div>\n<p><a href=\"https:\/\/github.com\/MDFARHYN\/wikipedia_scraping\" rel=\"nofollow noopener\" target=\"_blank\">\u00a0Download all source code from GitHub<\/a><\/p>\n<p><a href=\"https:\/\/farhyn.com\/\" rel=\"nofollow noopener\" target=\"_blank\">my website<\/a><\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Download all source code from GitHub Table of content Introduction Installing the Tools You Need Verify Installations How to Scrape and Clean Data Scraping Text&hellip;<\/p>\n","protected":false},"author":23,"featured_media":1121,"comment_status":"open","ping_status":"closed","template":"","meta":{"rank_math_lock_modified_date":false},"categories":[],"class_list":["post-1112","scraping_project","type-scraping_project","status-publish","has-post-thumbnail","hentry"],"_links":{"self":[{"href":"https:\/\/rayobyte.com\/community\/wp-json\/wp\/v2\/scraping_project\/1112","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/rayobyte.com\/community\/wp-json\/wp\/v2\/scraping_project"}],"about":[{"href":"https:\/\/rayobyte.com\/community\/wp-json\/wp\/v2\/types\/scraping_project"}],"author":[{"embeddable":true,"href":"https:\/\/rayobyte.com\/community\/wp-json\/wp\/v2\/users\/23"}],"replies":[{"embeddable":true,"href":"https:\/\/rayobyte.com\/community\/wp-json\/wp\/v2\/comments?post=1112"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/rayobyte.com\/community\/wp-json\/wp\/v2\/media\/1121"}],"wp:attachment":[{"href":"https:\/\/rayobyte.com\/community\/wp-json\/wp\/v2\/media?parent=1112"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/rayobyte.com\/community\/wp-json\/wp\/v2\/categories?post=1112"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}