easiest (legal) way to programmatically get the google search result count?

10,070

Solution 1

/**** @author RAJESH Kharche */
//open Netbeans
//Choose Java->prject
//name it GoogleSearchAPP

package googlesearchapp;

import java.io.*;
import java.net.*;
import java.util.*;
import java.util.logging.Level;
import java.util.logging.Logger;

public class GoogleSearchAPP {
    public static void main(String[] args) {
        try {
            // TODO code application logic here

            final int Result;

            Scanner s1=new Scanner(System.in);
            String Str;
            System.out.println("Enter Query to search: ");//get the query to search
            Str=s1.next();
            Result=getResultsCount(Str);

            System.out.println("Results:"+ Result);
        } catch (IOException ex) {
            Logger.getLogger(GoogleSearchAPP.class.getName()).log(Level.SEVERE, null, ex);
        }      
    }

    private static int getResultsCount(final String query) throws IOException {
        final URL url;
        url = new URL("https://www.google.com/search?q=" + URLEncoder.encode(query, "UTF-8"));
        final URLConnection connection = url.openConnection();

        connection.setConnectTimeout(60000);
        connection.setReadTimeout(60000);
        connection.addRequestProperty("User-Agent", "Google Chrome/36");//put the browser name/version

        final Scanner reader = new Scanner(connection.getInputStream(), "UTF-8");  //scanning a buffer from object returned by http request

        while(reader.hasNextLine()){   //for each line in buffer
            final String line = reader.nextLine();

            if(!line.contains("\"resultStats\">"))//line by line scanning for "resultstats" field because we want to extract number after it
                continue;

            try{        
                return Integer.parseInt(line.split("\"resultStats\">")[1].split("<")[0].replaceAll("[^\\d]", ""));//finally extract the number convert from string to integer
            }finally{
                reader.close();
            }
        }
        reader.close();
        return 0;
    }
}

Solution 2

Well something you can do is perform an actual Google search programmatically to begin with. The easiest possible way to do this would be to access the url https://www.google.com/search?q=QUERY_HERE and then you want to scrape the result count off that page.

Here is a quick example of how to do that:

    private static int getResultsCount(final String query) throws IOException {
    final URL url = new URL("https://www.google.com/search?q=" + URLEncoder.encode(query, "UTF-8"));
    final URLConnection connection = url.openConnection();
    connection.setConnectTimeout(60000);
    connection.setReadTimeout(60000);
    connection.addRequestProperty("User-Agent", "Mozilla/5.0");
    final Scanner reader = new Scanner(connection.getInputStream(), "UTF-8");
    while(reader.hasNextLine()){
        final String line = reader.nextLine();
        if(!line.contains("<div id=\"resultStats\">"))
            continue;
        try{
            return Integer.parseInt(line.split("<div id=\"resultStats\">")[1].split("<")[0].replaceAll("[^\\d]", ""));
        }finally{
            reader.close();
        }
    }
    reader.close();
    return 0;
}

For usage, you would do something like:

final int count = getResultsCount("horses");
System.out.println("Estimated number of results for horses: " + count);
Share:
10,070
Marcus
Author by

Marcus

My interests include usability of websites and other products software development in Java and in Scala web technologies

Updated on August 02, 2022

Comments

  • Marcus
    Marcus almost 2 years

    I want to get the estimated result count for certain Google search engine queries (on the whole web) using Java code.

    I need to do only very few queries per day, so at first Google Web Search API, though deprecated, seemed good enough (see e.g. How can you search Google Programmatically Java API). But as it turned out, the numbers returned by this API are very different from those returned by www.google.com (see e.g. http://code.google.com/p/google-ajax-apis/issues/detail?id=32). So these numbers are pretty useless for me.

    I also tried Google Custom Search engine, which exhibits the same problem.

    What do you think is the simplest solution for my task?

  • Marcus
    Marcus almost 11 years
    Thanks, this looks good. But AFAIR the terms of service do not permit this. Do they? They say something like one must use only Google GUIs and/or APIs …
  • Josh M
    Josh M almost 11 years
    Surely that might be the case, but I think it depends on what your intent is. I'm not too sure if this is breaching any of their terms or not, but I think you should probably look into it to make sure it is safe.
  • Quickredfox
    Quickredfox over 10 years
    FYI. This approach eventually leads to a 503 Error and a captcha.
  • honk
    honk over 9 years
    Could you please add some explanation to your solution?
  • Rajesh Kharche
    Rajesh Kharche over 9 years
    hey if you want me to sent you the content returned by link in the object i surely will.
  • honk
    honk over 9 years
    You seem to have reused the code from the answer by @JoshM. However, you have modified and extended the code. What was the reason to do so? What does your code better/differently than that of @JoshM? Such kind of explanation would help readers to understand your solution.
  • David H. Bennett
    David H. Bennett about 9 years
    I just tried Josh's code, didn't work. Tried Rajesh's class and it did.
  • Roberto Bonini
    Roberto Bonini almost 9 years
    @DavidH.Bennett Any reason why the difference?
  • David H. Bennett
    David H. Bennett almost 9 years
    @RobertoBonini it was a q&d count I needed I didn't investigate :)
  • Rohit Nishad
    Rohit Nishad about 3 years
    It's not legal, I think it's a violation of Google TOS.