News Feed Forums General Web Scraping How to scrape movie titles and links on YesMovies.org (unblocked) using Python?

  • How to scrape movie titles and links on YesMovies.org (unblocked) using Python?

    Posted by Eulogia Suad on 12/11/2024 at 8:22 am

    Scraping movie titles and links from YesMovies.org (unblocked) can help gather data for personal use, such as creating a watchlist or analyzing trends. However, given that sites like YesMovies often employ anti-scraping measures and dynamic JavaScript content rendering, Python with Selenium is a reliable choice for handling these challenges. Start by analyzing the page structure to locate the classes or IDs that house the movie titles and links. Selenium can automate interactions like scrolling or clicking to ensure all content is loaded before scraping.Here’s an example of using Selenium to scrape movie titles and links:

    from selenium import webdriver
    from selenium.webdriver.common.by import By
    # Initialize the WebDriver
    driver = webdriver.Chrome()
    driver.get("https://example.com/movies")
    # Wait for the page to load
    driver.implicitly_wait(10)
    # Extract movie titles and links
    movies = driver.find_elements(By.CLASS_NAME, "movie-item")
    for movie in movies:
        title = movie.find_element(By.CLASS_NAME, "movie-title").text.strip()
        link = movie.find_element(By.TAG_NAME, "a").get_attribute("href")
        print(f"Title: {title}, Link: {link}")
    # Close the browser
    driver.quit()
    

    For sites with infinite scrolling or pagination, Selenium’s scrolling functions or automated navigation can help load additional content dynamically. Ensure you comply with legal and ethical guidelines when scraping. How do you handle CAPTCHA challenges that might appear during scraping?

    Eulogia Suad replied 1 week, 3 days ago 8 Members · 7 Replies
  • 7 Replies
  • Olga Silvester

    Member
    12/11/2024 at 9:59 am

    I regularly update the bot by testing it on the target websites. Using flexible selectors, like XPath based on attributes, makes the bot adaptable to minor changes.

  • Khordad Leto

    Member
    12/11/2024 at 11:10 am

    Implementing error handling and retries ensures the scraper doesn’t fail entirely when a single request or element retrieval encounters an issue.

  • Afnan Ayumi

    Member
    12/14/2024 at 6:04 am

    To handle CAPTCHAs, I integrate third-party solving services like 2Captcha, though I aim to avoid triggering CAPTCHAs by reducing request frequency and mimicking real user behavior.

  • Jochem Gunvor

    Member
    12/14/2024 at 6:52 am

    I use Selenium’s ActionChains to simulate user interactions, like mouse movements and clicks, which help avoid detection and prevent CAPTCHA challenges from appearing.

  • Herleva Davor

    Member
    12/18/2024 at 6:21 am

    Implementing proxy rotation and adding randomized delays between interactions reduces the likelihood of being flagged, ensuring smoother scraping sessions.

  • Ammar Saiful

    Member
    12/19/2024 at 10:37 am

    For sites with strict anti-scraping measures, I monitor network requests to identify potential API endpoints, which often provide the same data in a simpler, JSON format.

  • Gohar Maksimilijan

    Member
    03/19/2025 at 3:23 pm

    Unblock YesMovies.org Using Proxies and Automate Scraping Movies Using Java and MySQL

    Setting Up Java for Web Scraping

    Before scraping movie data from YesMovies.org, install the required Java dependencies:

    Plain text
    Copy to clipboard
    Open code in new window
    EnlighterJS 3 Syntax Highlighter
    <dependencies> <dependency> <groupId>org.jsoup</groupId> <artifactId>jsoup</artifactId> <version>1.15.3</version> </dependency> <dependency> <groupId>org.apache.httpcomponents.client5</groupId> <artifactId>httpclient5</artifactId> <version>5.2</version> </dependency> </dependencies>
    <dependencies> <dependency> <groupId>org.jsoup</groupId> <artifactId>jsoup</artifactId> <version>1.15.3</version> </dependency> <dependency> <groupId>org.apache.httpcomponents.client5</groupId> <artifactId>httpclient5</artifactId> <version>5.2</version> </dependency> </dependencies>
     <dependencies> <dependency> <groupId>org.jsoup</groupId> <artifactId>jsoup</artifactId> <version>1.15.3</version> </dependency> <dependency> <groupId>org.apache.httpcomponents.client5</groupId> <artifactId>httpclient5</artifactId> <version>5.2</version> </dependency> </dependencies>

    Configuring a Proxy in Java

    To unblock YesMovies.org, configure Java to use a proxy:

    Plain text
    Copy to clipboard
    Open code in new window
    EnlighterJS 3 Syntax Highlighter
    import java.io.IOException; import java.net.*; import org.jsoup.Jsoup; import org.jsoup.nodes.Document; public class YesMoviesUnblocker { public static void main(String[] args) { String proxyHost = "your.proxy.server"; int proxyPort = 8080; try { Proxy proxy = new Proxy(Proxy.Type.HTTP, new InetSocketAddress(proxyHost, proxyPort)); URL url = new URL("https://yesmovies.org"); HttpURLConnection connection = (HttpURLConnection) url.openConnection(proxy); connection.setRequestMethod("GET"); connection.setRequestProperty("User-Agent", "Mozilla/5.0"); Document doc = Jsoup.parse(connection.getInputStream(), "UTF-8", url.toString()); System.out.println(doc.title()); } catch (IOException e) { e.printStackTrace(); } } }
    import java.io.IOException; import java.net.*; import org.jsoup.Jsoup; import org.jsoup.nodes.Document; public class YesMoviesUnblocker { public static void main(String[] args) { String proxyHost = "your.proxy.server"; int proxyPort = 8080; try { Proxy proxy = new Proxy(Proxy.Type.HTTP, new InetSocketAddress(proxyHost, proxyPort)); URL url = new URL("https://yesmovies.org"); HttpURLConnection connection = (HttpURLConnection) url.openConnection(proxy); connection.setRequestMethod("GET"); connection.setRequestProperty("User-Agent", "Mozilla/5.0"); Document doc = Jsoup.parse(connection.getInputStream(), "UTF-8", url.toString()); System.out.println(doc.title()); } catch (IOException e) { e.printStackTrace(); } } }
     import java.io.IOException; import java.net.*; import org.jsoup.Jsoup; import org.jsoup.nodes.Document; public class YesMoviesUnblocker { public static void main(String[] args) { String proxyHost = "your.proxy.server"; int proxyPort = 8080; try { Proxy proxy = new Proxy(Proxy.Type.HTTP, new InetSocketAddress(proxyHost, proxyPort)); URL url = new URL("https://yesmovies.org"); HttpURLConnection connection = (HttpURLConnection) url.openConnection(proxy); connection.setRequestMethod("GET"); connection.setRequestProperty("User-Agent", "Mozilla/5.0"); Document doc = Jsoup.parse(connection.getInputStream(), "UTF-8", url.toString()); System.out.println(doc.title()); } catch (IOException e) { e.printStackTrace(); } } }

    Scraping Movie Information from YesMovies

    Once unblocked, extract movie details such as titles, ratings, and descriptions using Java and JSoup.

    Plain text
    Copy to clipboard
    Open code in new window
    EnlighterJS 3 Syntax Highlighter
    import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; import org.jsoup.select.Elements; import java.io.IOException; public class YesMoviesScraper { public static void main(String[] args) { String url = "https://yesmovies.org/movies"; try { Document doc = Jsoup.connect(url).userAgent("Mozilla/5.0").get(); Elements movies = doc.select(".movie-item"); for (Element movie : movies) { String title = movie.select(".movie-title").text(); String rating = movie.select(".rating").text(); String description = movie.select(".description").text(); System.out.println("Title: " + title); System.out.println("Rating: " + rating); System.out.println("Description: " + description); System.out.println("-----------------------------"); } } catch (IOException e) { e.printStackTrace(); } } }
    import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; import org.jsoup.select.Elements; import java.io.IOException; public class YesMoviesScraper { public static void main(String[] args) { String url = "https://yesmovies.org/movies"; try { Document doc = Jsoup.connect(url).userAgent("Mozilla/5.0").get(); Elements movies = doc.select(".movie-item"); for (Element movie : movies) { String title = movie.select(".movie-title").text(); String rating = movie.select(".rating").text(); String description = movie.select(".description").text(); System.out.println("Title: " + title); System.out.println("Rating: " + rating); System.out.println("Description: " + description); System.out.println("-----------------------------"); } } catch (IOException e) { e.printStackTrace(); } } }
     import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; import org.jsoup.select.Elements; import java.io.IOException; public class YesMoviesScraper { public static void main(String[] args) { String url = "https://yesmovies.org/movies"; try { Document doc = Jsoup.connect(url).userAgent("Mozilla/5.0").get(); Elements movies = doc.select(".movie-item"); for (Element movie : movies) { String title = movie.select(".movie-title").text(); String rating = movie.select(".rating").text(); String description = movie.select(".description").text(); System.out.println("Title: " + title); System.out.println("Rating: " + rating); System.out.println("Description: " + description); System.out.println("-----------------------------"); } } catch (IOException e) { e.printStackTrace(); } } }

    Storing Scraped Movie Data in MySQL

    To store the extracted movie data, set up a MySQL database.

    Creating the MySQL Database and Table

    Plain text
    Copy to clipboard
    Open code in new window
    EnlighterJS 3 Syntax Highlighter
    CREATE DATABASE MovieScraperDB; USE MovieScraperDB; CREATE TABLE movies ( id INT AUTO_INCREMENT PRIMARY KEY, title VARCHAR(255), rating VARCHAR(10), description TEXT, timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP );
    CREATE DATABASE MovieScraperDB; USE MovieScraperDB; CREATE TABLE movies ( id INT AUTO_INCREMENT PRIMARY KEY, title VARCHAR(255), rating VARCHAR(10), description TEXT, timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP );
     CREATE DATABASE MovieScraperDB; USE MovieScraperDB; CREATE TABLE movies ( id INT AUTO_INCREMENT PRIMARY KEY, title VARCHAR(255), rating VARCHAR(10), description TEXT, timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP );

    Inserting Movie Data into MySQL

    Plain text
    Copy to clipboard
    Open code in new window
    EnlighterJS 3 Syntax Highlighter
    import java.sql.Connection; import java.sql.DriverManager; import java.sql.PreparedStatement; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; import org.jsoup.select.Elements; public class MovieDatabaseHandler { private static final String DB_URL = "jdbc:mysql://localhost:3306/MovieScraperDB"; private static final String USER = "root"; private static final String PASSWORD = "password"; public static void saveMovies(Document doc) { String sql = "INSERT INTO movies (title, rating, description) VALUES (?, ?, ?)"; try (Connection conn = DriverManager.getConnection(DB_URL, USER, PASSWORD); PreparedStatement stmt = conn.prepareStatement(sql)) { Elements movies = doc.select(".movie-item"); for (Element movie : movies) { String title = movie.select(".movie-title").text(); String rating = movie.select(".rating").text(); String description = movie.select(".description").text(); stmt.setString(1, title); stmt.setString(2, rating); stmt.setString(3, description); stmt.executeUpdate(); } System.out.println("Movie data stored successfully."); } catch (Exception e) { e.printStackTrace(); } } }
    import java.sql.Connection; import java.sql.DriverManager; import java.sql.PreparedStatement; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; import org.jsoup.select.Elements; public class MovieDatabaseHandler { private static final String DB_URL = "jdbc:mysql://localhost:3306/MovieScraperDB"; private static final String USER = "root"; private static final String PASSWORD = "password"; public static void saveMovies(Document doc) { String sql = "INSERT INTO movies (title, rating, description) VALUES (?, ?, ?)"; try (Connection conn = DriverManager.getConnection(DB_URL, USER, PASSWORD); PreparedStatement stmt = conn.prepareStatement(sql)) { Elements movies = doc.select(".movie-item"); for (Element movie : movies) { String title = movie.select(".movie-title").text(); String rating = movie.select(".rating").text(); String description = movie.select(".description").text(); stmt.setString(1, title); stmt.setString(2, rating); stmt.setString(3, description); stmt.executeUpdate(); } System.out.println("Movie data stored successfully."); } catch (Exception e) { e.printStackTrace(); } } }
     import java.sql.Connection; import java.sql.DriverManager; import java.sql.PreparedStatement; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; import org.jsoup.select.Elements; public class MovieDatabaseHandler { private static final String DB_URL = "jdbc:mysql://localhost:3306/MovieScraperDB"; private static final String USER = "root"; private static final String PASSWORD = "password"; public static void saveMovies(Document doc) { String sql = "INSERT INTO movies (title, rating, description) VALUES (?, ?, ?)"; try (Connection conn = DriverManager.getConnection(DB_URL, USER, PASSWORD); PreparedStatement stmt = conn.prepareStatement(sql)) { Elements movies = doc.select(".movie-item"); for (Element movie : movies) { String title = movie.select(".movie-title").text(); String rating = movie.select(".rating").text(); String description = movie.select(".description").text(); stmt.setString(1, title); stmt.setString(2, rating); stmt.setString(3, description); stmt.executeUpdate(); } System.out.println("Movie data stored successfully."); } catch (Exception e) { e.printStackTrace(); } } }

    Handling Anti-Scraping Measures

    YesMovies.org may implement anti-scraping measures, so follow best practices to avoid detection:

    • Use Rotating Proxies: Rotate IP addresses to prevent getting blocked.
    • Set User-Agent Headers: Mimic a real browser.
    • Introduce Random Delays: Avoid sending requests too quickly to prevent rate-limiting.
    • Use Headless Browsing: If needed, Selenium can be used for JavaScript-heavy pages.

    Conclusion

    Unblocking YesMovies.org using proxies allows access to its content from restricted regions. By automating the data extraction process with Java and JSoup, users can scrape movie details efficiently. Storing the extracted data in MySQL ensures easy retrieval and analysis. Implementing best practices such as rotating proxies, request throttling, and setting user-agent headers helps avoid detection while scraping YesMovies. Whether for research, database creation, or personal use, automated scraping can make movie data collection seamless and efficient.

Log in to reply.

Start of Discussion
1 of 7 replies December 2024
Now