News Feed › Forums › General Web Scraping › How to scrape movie titles and links on YesMovies.org (unblocked) using Python?
-
How to scrape movie titles and links on YesMovies.org (unblocked) using Python?
Posted by Eulogia Suad on 12/11/2024 at 8:22 amScraping movie titles and links from YesMovies.org (unblocked) can help gather data for personal use, such as creating a watchlist or analyzing trends. However, given that sites like YesMovies often employ anti-scraping measures and dynamic JavaScript content rendering, Python with Selenium is a reliable choice for handling these challenges. Start by analyzing the page structure to locate the classes or IDs that house the movie titles and links. Selenium can automate interactions like scrolling or clicking to ensure all content is loaded before scraping.Here’s an example of using Selenium to scrape movie titles and links:
from selenium import webdriver from selenium.webdriver.common.by import By # Initialize the WebDriver driver = webdriver.Chrome() driver.get("https://example.com/movies") # Wait for the page to load driver.implicitly_wait(10) # Extract movie titles and links movies = driver.find_elements(By.CLASS_NAME, "movie-item") for movie in movies: title = movie.find_element(By.CLASS_NAME, "movie-title").text.strip() link = movie.find_element(By.TAG_NAME, "a").get_attribute("href") print(f"Title: {title}, Link: {link}") # Close the browser driver.quit()
For sites with infinite scrolling or pagination, Selenium’s scrolling functions or automated navigation can help load additional content dynamically. Ensure you comply with legal and ethical guidelines when scraping. How do you handle CAPTCHA challenges that might appear during scraping?
Eulogia Suad replied 1 week, 3 days ago 8 Members · 7 Replies -
7 Replies
-
I regularly update the bot by testing it on the target websites. Using flexible selectors, like XPath based on attributes, makes the bot adaptable to minor changes.
-
Implementing error handling and retries ensures the scraper doesn’t fail entirely when a single request or element retrieval encounters an issue.
-
To handle CAPTCHAs, I integrate third-party solving services like 2Captcha, though I aim to avoid triggering CAPTCHAs by reducing request frequency and mimicking real user behavior.
-
I use Selenium’s ActionChains to simulate user interactions, like mouse movements and clicks, which help avoid detection and prevent CAPTCHA challenges from appearing.
-
Implementing proxy rotation and adding randomized delays between interactions reduces the likelihood of being flagged, ensuring smoother scraping sessions.
-
For sites with strict anti-scraping measures, I monitor network requests to identify potential API endpoints, which often provide the same data in a simpler, JSON format.
-
Unblock YesMovies.org Using Proxies and Automate Scraping Movies Using Java and MySQL
Setting Up Java for Web Scraping
Before scraping movie data from YesMovies.org, install the required Java dependencies:
Plain textCopy to clipboardOpen code in new windowEnlighterJS 3 Syntax Highlighter<dependencies> <dependency> <groupId>org.jsoup</groupId> <artifactId>jsoup</artifactId> <version>1.15.3</version> </dependency> <dependency> <groupId>org.apache.httpcomponents.client5</groupId> <artifactId>httpclient5</artifactId> <version>5.2</version> </dependency> </dependencies><dependencies> <dependency> <groupId>org.jsoup</groupId> <artifactId>jsoup</artifactId> <version>1.15.3</version> </dependency> <dependency> <groupId>org.apache.httpcomponents.client5</groupId> <artifactId>httpclient5</artifactId> <version>5.2</version> </dependency> </dependencies><dependencies> <dependency> <groupId>org.jsoup</groupId> <artifactId>jsoup</artifactId> <version>1.15.3</version> </dependency> <dependency> <groupId>org.apache.httpcomponents.client5</groupId> <artifactId>httpclient5</artifactId> <version>5.2</version> </dependency> </dependencies>
Configuring a Proxy in Java
To unblock YesMovies.org, configure Java to use a proxy:
Plain textCopy to clipboardOpen code in new windowEnlighterJS 3 Syntax Highlighterimport java.io.IOException; import java.net.*; import org.jsoup.Jsoup; import org.jsoup.nodes.Document; public class YesMoviesUnblocker { public static void main(String[] args) { String proxyHost = "your.proxy.server"; int proxyPort = 8080; try { Proxy proxy = new Proxy(Proxy.Type.HTTP, new InetSocketAddress(proxyHost, proxyPort)); URL url = new URL("https://yesmovies.org"); HttpURLConnection connection = (HttpURLConnection) url.openConnection(proxy); connection.setRequestMethod("GET"); connection.setRequestProperty("User-Agent", "Mozilla/5.0"); Document doc = Jsoup.parse(connection.getInputStream(), "UTF-8", url.toString()); System.out.println(doc.title()); } catch (IOException e) { e.printStackTrace(); } } }import java.io.IOException; import java.net.*; import org.jsoup.Jsoup; import org.jsoup.nodes.Document; public class YesMoviesUnblocker { public static void main(String[] args) { String proxyHost = "your.proxy.server"; int proxyPort = 8080; try { Proxy proxy = new Proxy(Proxy.Type.HTTP, new InetSocketAddress(proxyHost, proxyPort)); URL url = new URL("https://yesmovies.org"); HttpURLConnection connection = (HttpURLConnection) url.openConnection(proxy); connection.setRequestMethod("GET"); connection.setRequestProperty("User-Agent", "Mozilla/5.0"); Document doc = Jsoup.parse(connection.getInputStream(), "UTF-8", url.toString()); System.out.println(doc.title()); } catch (IOException e) { e.printStackTrace(); } } }import java.io.IOException; import java.net.*; import org.jsoup.Jsoup; import org.jsoup.nodes.Document; public class YesMoviesUnblocker { public static void main(String[] args) { String proxyHost = "your.proxy.server"; int proxyPort = 8080; try { Proxy proxy = new Proxy(Proxy.Type.HTTP, new InetSocketAddress(proxyHost, proxyPort)); URL url = new URL("https://yesmovies.org"); HttpURLConnection connection = (HttpURLConnection) url.openConnection(proxy); connection.setRequestMethod("GET"); connection.setRequestProperty("User-Agent", "Mozilla/5.0"); Document doc = Jsoup.parse(connection.getInputStream(), "UTF-8", url.toString()); System.out.println(doc.title()); } catch (IOException e) { e.printStackTrace(); } } }
Scraping Movie Information from YesMovies
Once unblocked, extract movie details such as titles, ratings, and descriptions using Java and JSoup.
Plain textCopy to clipboardOpen code in new windowEnlighterJS 3 Syntax Highlighterimport org.jsoup.Jsoup; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; import org.jsoup.select.Elements; import java.io.IOException; public class YesMoviesScraper { public static void main(String[] args) { String url = "https://yesmovies.org/movies"; try { Document doc = Jsoup.connect(url).userAgent("Mozilla/5.0").get(); Elements movies = doc.select(".movie-item"); for (Element movie : movies) { String title = movie.select(".movie-title").text(); String rating = movie.select(".rating").text(); String description = movie.select(".description").text(); System.out.println("Title: " + title); System.out.println("Rating: " + rating); System.out.println("Description: " + description); System.out.println("-----------------------------"); } } catch (IOException e) { e.printStackTrace(); } } }import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; import org.jsoup.select.Elements; import java.io.IOException; public class YesMoviesScraper { public static void main(String[] args) { String url = "https://yesmovies.org/movies"; try { Document doc = Jsoup.connect(url).userAgent("Mozilla/5.0").get(); Elements movies = doc.select(".movie-item"); for (Element movie : movies) { String title = movie.select(".movie-title").text(); String rating = movie.select(".rating").text(); String description = movie.select(".description").text(); System.out.println("Title: " + title); System.out.println("Rating: " + rating); System.out.println("Description: " + description); System.out.println("-----------------------------"); } } catch (IOException e) { e.printStackTrace(); } } }import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; import org.jsoup.select.Elements; import java.io.IOException; public class YesMoviesScraper { public static void main(String[] args) { String url = "https://yesmovies.org/movies"; try { Document doc = Jsoup.connect(url).userAgent("Mozilla/5.0").get(); Elements movies = doc.select(".movie-item"); for (Element movie : movies) { String title = movie.select(".movie-title").text(); String rating = movie.select(".rating").text(); String description = movie.select(".description").text(); System.out.println("Title: " + title); System.out.println("Rating: " + rating); System.out.println("Description: " + description); System.out.println("-----------------------------"); } } catch (IOException e) { e.printStackTrace(); } } }
Storing Scraped Movie Data in MySQL
To store the extracted movie data, set up a MySQL database.
Creating the MySQL Database and Table
Plain textCopy to clipboardOpen code in new windowEnlighterJS 3 Syntax HighlighterCREATE DATABASE MovieScraperDB; USE MovieScraperDB; CREATE TABLE movies ( id INT AUTO_INCREMENT PRIMARY KEY, title VARCHAR(255), rating VARCHAR(10), description TEXT, timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP );CREATE DATABASE MovieScraperDB; USE MovieScraperDB; CREATE TABLE movies ( id INT AUTO_INCREMENT PRIMARY KEY, title VARCHAR(255), rating VARCHAR(10), description TEXT, timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP );CREATE DATABASE MovieScraperDB; USE MovieScraperDB; CREATE TABLE movies ( id INT AUTO_INCREMENT PRIMARY KEY, title VARCHAR(255), rating VARCHAR(10), description TEXT, timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP );
Inserting Movie Data into MySQL
Plain textCopy to clipboardOpen code in new windowEnlighterJS 3 Syntax Highlighterimport java.sql.Connection; import java.sql.DriverManager; import java.sql.PreparedStatement; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; import org.jsoup.select.Elements; public class MovieDatabaseHandler { private static final String DB_URL = "jdbc:mysql://localhost:3306/MovieScraperDB"; private static final String USER = "root"; private static final String PASSWORD = "password"; public static void saveMovies(Document doc) { String sql = "INSERT INTO movies (title, rating, description) VALUES (?, ?, ?)"; try (Connection conn = DriverManager.getConnection(DB_URL, USER, PASSWORD); PreparedStatement stmt = conn.prepareStatement(sql)) { Elements movies = doc.select(".movie-item"); for (Element movie : movies) { String title = movie.select(".movie-title").text(); String rating = movie.select(".rating").text(); String description = movie.select(".description").text(); stmt.setString(1, title); stmt.setString(2, rating); stmt.setString(3, description); stmt.executeUpdate(); } System.out.println("Movie data stored successfully."); } catch (Exception e) { e.printStackTrace(); } } }import java.sql.Connection; import java.sql.DriverManager; import java.sql.PreparedStatement; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; import org.jsoup.select.Elements; public class MovieDatabaseHandler { private static final String DB_URL = "jdbc:mysql://localhost:3306/MovieScraperDB"; private static final String USER = "root"; private static final String PASSWORD = "password"; public static void saveMovies(Document doc) { String sql = "INSERT INTO movies (title, rating, description) VALUES (?, ?, ?)"; try (Connection conn = DriverManager.getConnection(DB_URL, USER, PASSWORD); PreparedStatement stmt = conn.prepareStatement(sql)) { Elements movies = doc.select(".movie-item"); for (Element movie : movies) { String title = movie.select(".movie-title").text(); String rating = movie.select(".rating").text(); String description = movie.select(".description").text(); stmt.setString(1, title); stmt.setString(2, rating); stmt.setString(3, description); stmt.executeUpdate(); } System.out.println("Movie data stored successfully."); } catch (Exception e) { e.printStackTrace(); } } }import java.sql.Connection; import java.sql.DriverManager; import java.sql.PreparedStatement; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; import org.jsoup.select.Elements; public class MovieDatabaseHandler { private static final String DB_URL = "jdbc:mysql://localhost:3306/MovieScraperDB"; private static final String USER = "root"; private static final String PASSWORD = "password"; public static void saveMovies(Document doc) { String sql = "INSERT INTO movies (title, rating, description) VALUES (?, ?, ?)"; try (Connection conn = DriverManager.getConnection(DB_URL, USER, PASSWORD); PreparedStatement stmt = conn.prepareStatement(sql)) { Elements movies = doc.select(".movie-item"); for (Element movie : movies) { String title = movie.select(".movie-title").text(); String rating = movie.select(".rating").text(); String description = movie.select(".description").text(); stmt.setString(1, title); stmt.setString(2, rating); stmt.setString(3, description); stmt.executeUpdate(); } System.out.println("Movie data stored successfully."); } catch (Exception e) { e.printStackTrace(); } } }
Handling Anti-Scraping Measures
YesMovies.org may implement anti-scraping measures, so follow best practices to avoid detection:
- Use Rotating Proxies: Rotate IP addresses to prevent getting blocked.
- Set User-Agent Headers: Mimic a real browser.
- Introduce Random Delays: Avoid sending requests too quickly to prevent rate-limiting.
- Use Headless Browsing: If needed, Selenium can be used for JavaScript-heavy pages.
Conclusion
Unblocking YesMovies.org using proxies allows access to its content from restricted regions. By automating the data extraction process with Java and JSoup, users can scrape movie details efficiently. Storing the extracted data in MySQL ensures easy retrieval and analysis. Implementing best practices such as rotating proxies, request throttling, and setting user-agent headers helps avoid detection while scraping YesMovies. Whether for research, database creation, or personal use, automated scraping can make movie data collection seamless and efficient.
Log in to reply.