

Gohar Maksimilijan
Forum Replies Created
-
Gohar Maksimilijan
Member03/19/2025 at 3:23 pm in reply to: How to scrape movie titles and links on YesMovies.org (unblocked) using Python?Unblock YesMovies.org Using Proxies and Automate Scraping Movies Using Java and MySQL
Setting Up Java for Web Scraping
Before scraping movie data from YesMovies.org, install the required Java dependencies:
<dependencies> <dependency> <groupId>org.jsoup</groupId> <artifactId>jsoup</artifactId> <version>1.15.3</version> </dependency> <dependency> <groupId>org.apache.httpcomponents.client5</groupId> <artifactId>httpclient5</artifactId> <version>5.2</version> </dependency> </dependencies>
Configuring a Proxy in Java
To unblock YesMovies.org, configure Java to use a proxy:
import java.io.IOException; import java.net.*; import org.jsoup.Jsoup; import org.jsoup.nodes.Document; public class YesMoviesUnblocker { public static void main(String[] args) { String proxyHost = "your.proxy.server"; int proxyPort = 8080; try { Proxy proxy = new Proxy(Proxy.Type.HTTP, new InetSocketAddress(proxyHost, proxyPort)); URL url = new URL("https://yesmovies.org"); HttpURLConnection connection = (HttpURLConnection) url.openConnection(proxy); connection.setRequestMethod("GET"); connection.setRequestProperty("User-Agent", "Mozilla/5.0"); Document doc = Jsoup.parse(connection.getInputStream(), "UTF-8", url.toString()); System.out.println(doc.title()); } catch (IOException e) { e.printStackTrace(); } } }
Scraping Movie Information from YesMovies
Once unblocked, extract movie details such as titles, ratings, and descriptions using Java and JSoup.
import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; import org.jsoup.select.Elements; import java.io.IOException; public class YesMoviesScraper { public static void main(String[] args) { String url = "https://yesmovies.org/movies"; try { Document doc = Jsoup.connect(url).userAgent("Mozilla/5.0").get(); Elements movies = doc.select(".movie-item"); for (Element movie : movies) { String title = movie.select(".movie-title").text(); String rating = movie.select(".rating").text(); String description = movie.select(".description").text(); System.out.println("Title: " + title); System.out.println("Rating: " + rating); System.out.println("Description: " + description); System.out.println("-----------------------------"); } } catch (IOException e) { e.printStackTrace(); } } }
Storing Scraped Movie Data in MySQL
To store the extracted movie data, set up a MySQL database.
Creating the MySQL Database and Table
CREATE DATABASE MovieScraperDB; USE MovieScraperDB; CREATE TABLE movies ( id INT AUTO_INCREMENT PRIMARY KEY, title VARCHAR(255), rating VARCHAR(10), description TEXT, timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP );
Inserting Movie Data into MySQL
import java.sql.Connection; import java.sql.DriverManager; import java.sql.PreparedStatement; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; import org.jsoup.select.Elements; public class MovieDatabaseHandler { private static final String DB_URL = "jdbc:mysql://localhost:3306/MovieScraperDB"; private static final String USER = "root"; private static final String PASSWORD = "password"; public static void saveMovies(Document doc) { String sql = "INSERT INTO movies (title, rating, description) VALUES (?, ?, ?)"; try (Connection conn = DriverManager.getConnection(DB_URL, USER, PASSWORD); PreparedStatement stmt = conn.prepareStatement(sql)) { Elements movies = doc.select(".movie-item"); for (Element movie : movies) { String title = movie.select(".movie-title").text(); String rating = movie.select(".rating").text(); String description = movie.select(".description").text(); stmt.setString(1, title); stmt.setString(2, rating); stmt.setString(3, description); stmt.executeUpdate(); } System.out.println("Movie data stored successfully."); } catch (Exception e) { e.printStackTrace(); } } }
Handling Anti-Scraping Measures
YesMovies.org may implement anti-scraping measures, so follow best practices to avoid detection:
- Use Rotating Proxies: Rotate IP addresses to prevent getting blocked.
- Set User-Agent Headers: Mimic a real browser.
- Introduce Random Delays: Avoid sending requests too quickly to prevent rate-limiting.
- Use Headless Browsing: If needed, Selenium can be used for JavaScript-heavy pages.
Conclusion
Unblocking YesMovies.org using proxies allows access to its content from restricted regions. By automating the data extraction process with Java and JSoup, users can scrape movie details efficiently. Storing the extracted data in MySQL ensures easy retrieval and analysis. Implementing best practices such as rotating proxies, request throttling, and setting user-agent headers helps avoid detection while scraping YesMovies. Whether for research, database creation, or personal use, automated scraping can make movie data collection seamless and efficient.
-
Gohar Maksimilijan
Member12/10/2024 at 11:19 am in reply to: How to scrape API data using Node.js and node-fetch?For infinite scrolling pages, I use Capybara to simulate scrolling until all content is loaded. This ensures complete data extraction without missing hidden user agent profiles.
-
Gohar Maksimilijan
Member12/10/2024 at 11:18 am in reply to: How to scrape API data using Node.js and node-fetch?I design my scraper with flexible CSS selectors or XPath queries that target attributes rather than static class names, making it easier to adapt to layout updates.