Replies – Discussions – Gohar Maksimilijan

Forum Replies Created

Gohar Maksimilijan

Member

03/19/2025 at 3:23 pm in reply to: How to scrape movie titles and links on YesMovies.org (unblocked) using Python?

Unblock YesMovies.org Using Proxies and Automate Scraping Movies Using Java and MySQL

Setting Up Java for Web Scraping

Before scraping movie data from YesMovies.org, install the required Java dependencies:

 <dependencies> <dependency> <groupId>org.jsoup</groupId> <artifactId>jsoup</artifactId> <version>1.15.3</version> </dependency> <dependency> <groupId>org.apache.httpcomponents.client5</groupId> <artifactId>httpclient5</artifactId> <version>5.2</version> </dependency> </dependencies>

Configuring a Proxy in Java

To unblock YesMovies.org, configure Java to use a proxy:

 import java.io.IOException; import java.net.*; import org.jsoup.Jsoup; import org.jsoup.nodes.Document; public class YesMoviesUnblocker { public static void main(String[] args) { String proxyHost = "your.proxy.server"; int proxyPort = 8080; try { Proxy proxy = new Proxy(Proxy.Type.HTTP, new InetSocketAddress(proxyHost, proxyPort)); URL url = new URL("https://yesmovies.org"); HttpURLConnection connection = (HttpURLConnection) url.openConnection(proxy); connection.setRequestMethod("GET"); connection.setRequestProperty("User-Agent", "Mozilla/5.0"); Document doc = Jsoup.parse(connection.getInputStream(), "UTF-8", url.toString()); System.out.println(doc.title()); } catch (IOException e) { e.printStackTrace(); } } }

Scraping Movie Information from YesMovies

Once unblocked, extract movie details such as titles, ratings, and descriptions using Java and JSoup.

 import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; import org.jsoup.select.Elements; import java.io.IOException; public class YesMoviesScraper { public static void main(String[] args) { String url = "https://yesmovies.org/movies"; try { Document doc = Jsoup.connect(url).userAgent("Mozilla/5.0").get(); Elements movies = doc.select(".movie-item"); for (Element movie : movies) { String title = movie.select(".movie-title").text(); String rating = movie.select(".rating").text(); String description = movie.select(".description").text(); System.out.println("Title: " + title); System.out.println("Rating: " + rating); System.out.println("Description: " + description); System.out.println("-----------------------------"); } } catch (IOException e) { e.printStackTrace(); } } }

Storing Scraped Movie Data in MySQL

To store the extracted movie data, set up a MySQL database.

Creating the MySQL Database and Table

 CREATE DATABASE MovieScraperDB; USE MovieScraperDB; CREATE TABLE movies ( id INT AUTO_INCREMENT PRIMARY KEY, title VARCHAR(255), rating VARCHAR(10), description TEXT, timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP );

Inserting Movie Data into MySQL

 import java.sql.Connection; import java.sql.DriverManager; import java.sql.PreparedStatement; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; import org.jsoup.select.Elements; public class MovieDatabaseHandler { private static final String DB_URL = "jdbc:mysql://localhost:3306/MovieScraperDB"; private static final String USER = "root"; private static final String PASSWORD = "password"; public static void saveMovies(Document doc) { String sql = "INSERT INTO movies (title, rating, description) VALUES (?, ?, ?)"; try (Connection conn = DriverManager.getConnection(DB_URL, USER, PASSWORD); PreparedStatement stmt = conn.prepareStatement(sql)) { Elements movies = doc.select(".movie-item"); for (Element movie : movies) { String title = movie.select(".movie-title").text(); String rating = movie.select(".rating").text(); String description = movie.select(".description").text(); stmt.setString(1, title); stmt.setString(2, rating); stmt.setString(3, description); stmt.executeUpdate(); } System.out.println("Movie data stored successfully."); } catch (Exception e) { e.printStackTrace(); } } }

Handling Anti-Scraping Measures

YesMovies.org may implement anti-scraping measures, so follow best practices to avoid detection:

Use Rotating Proxies: Rotate IP addresses to prevent getting blocked.
Set User-Agent Headers: Mimic a real browser.
Introduce Random Delays: Avoid sending requests too quickly to prevent rate-limiting.
Use Headless Browsing: If needed, Selenium can be used for JavaScript-heavy pages.

Conclusion

Unblocking YesMovies.org using proxies allows access to its content from restricted regions. By automating the data extraction process with Java and JSoup, users can scrape movie details efficiently. Storing the extracted data in MySQL ensures easy retrieval and analysis. Implementing best practices such as rotating proxies, request throttling, and setting user-agent headers helps avoid detection while scraping YesMovies. Whether for research, database creation, or personal use, automated scraping can make movie data collection seamless and efficient.

Gohar Maksimilijan

Member
12/10/2024 at 11:19 am in reply to: How to scrape API data using Node.js and node-fetch?

For infinite scrolling pages, I use Capybara to simulate scrolling until all content is loaded. This ensures complete data extraction without missing hidden user agent profiles.
Gohar Maksimilijan

Member
12/10/2024 at 11:18 am in reply to: How to scrape API data using Node.js and node-fetch?

I design my scraper with flexible CSS selectors or XPath queries that target attributes rather than static class names, making it easier to adapt to layout updates.