Learn Google – Page 14 – Tips, tricks, and thoughts about Google, AdWords, Google Cloud Platform, and all its subsidiaries. Not affiliated with or sponsored by Google.

March 1, 2019

Finding Interesting Files – The Filetype: Operator

Sometimes, a researcher needs to find something else other than a web page. News releases and raw data are often published for release as PDF files. Microsoft Powerpoint files (.PPTX) are often used to outline new company initiatives. Microsoft Word files (.DOCX) are shared while text is being edited/approved/discussed.

To find these files, the filetype: operator (or its alias, the ext: operator) can be used. For example, if I need to find official releases of employment data, a possible search would be one of the below:

employment data filetype:pdf
employment data ext:pdf

As you can note from the red boxes above, all the results are of .PDF files – as the search query asked for.

February 28, 2019

The define: operator – A Replacement For The Dictionary

Google search is not just a great search engine, but also a great library of utility functions. An example of this is the define: operator.

The define: operator acts as a dictionary: it lets you ask for the definition of a word. For example, searching for the below text gives me the definition of this strange word:

define:defenestration

If you have a phrase you need to look up, feel free to throw it in as well. I wonder what this phrase means…

define:trip the light fantastic

I often use this function to look up domain-specific words, such as words used only in the legal or technology fields, and I’ve always found useful, intelligent definitions.

February 27, 2019February 27, 2019

Limiting Your Search To A Single Site: The site: operator – Otherwise Known As My Favorite Operator

Perhaps the most known and used operator is the site: operator, which limits a search to a single site. For example, if I wanted to find all Disney related pages on Twitter, I might search for (remember, no spaces between site: and the site you’re searching):

disney site:twitter.com

As you can see, all the results are on twitter.com.

This operator is really useful on large sites that have poor search functionality – for example, searching Javadocs or social media sites such as Reddit.

February 26, 2019February 27, 2019

Finding Old/Historical/Archived Content – The Cache Operator & Archive Services

Is your bookmark leading to an empty webpage? Did that link you found on a forum post dated 5 years ago no longer work? Perhaps you need some information from a site and it’s currently down for maintenance?

Fortunately, Google has you covered. The cache: operator shows you the given web page as Google saw it before. Using it is easy: type in cache: and then the URL you need to see. Make sure there is no space between cache: and the address.

As an example, see below:

cache:reddit.com

After you hit the search button, you’ll get something similar to this:

On some occasions, Google won’t be able to find a cached page, and you’ll see an image similar to the below:

In these cases, it’s time to pop over to archive.org and use the Wayback Machine: put the URL you want into the Wayback Machine prompt:

You’ll see options to select a year and a specific date: Click the blue circled dates to see the web page as it was on that date.

The Wayback Machine is useful for seeing historical snapshots of web pages as well, and seeing how web pages change through time.

February 25, 2019

Delete Old Entities – Java Datastore

This is an ultra-simplified example of how to delete old entities from the App Engine Datastore. The first 3 lines of code retrieves the current date, then subtracts 60 days from the current time (the multiplication converts days to milliseconds). DATE_PROPERTY_ON_ENTITY is the date property on the entity – when first writing the entity to the datastore, add the current date as a property. ENTITY_KIND is the entity kind we’re deleting.

		//Calculate 60 days ago.
		long current_date_long = (new Date()).getTime();
		long past_date_long = current_date_long - (1000 * 60 * 60 * 24 * 60);
		Date past_date = new Date(past_date_long);
		
		DatastoreService datastore = DatastoreServiceFactory.getDatastoreService();
		Query.Filter date_filter = new Query.FilterPredicate("DATE_PROPERTY_ON_ENTITY", Query.FilterOperator.LESS_THAN_OR_EQUAL, past_date);
		Query date_query = new Query("ENTITY_KIND").setFilter(date_filter);
		PreparedQuery date_query_results = datastore.prepare(date_query);
		
		Iterator<Entity> iterate_over_old_entities = date_query_results.asIterator();
		
		while (iterate_over_old_entities.hasNext()) {
			Entity old_entity = iterate_over_old_entities.next();
			
			System.out.println("Deleting: " + old_entity.getProperties());
			
			datastore.delete(old_entity.getKey());
		}

Note that is a simplified function – it’s useful if you have a handful of entities that need deleting, but if you have more than a handful, you should convert to using datastore cursors and paging through entities to delete.

February 24, 2019

PHP Post To PubSub

Today is a rather large fragment demonstrating how to post to Google PubSub. While there are libraries to handle this, I prefer to understand the low-level process so debugging is easier.

Note that this fragment is designed to run on App Engine, as it relies on the App Identity service to pull the credentials required to publish to PubSub. You only need to set up 3 variables: $message_data, which should be a JSON-encodable object, NAMEOFGOOGLEPROJECT, which is the name of the Google project containing the pubsub funnel you want to publish to, and NAMEOFPUBSUB which is the pubsub funnel name.

It isn’t required, but it is good practice to customize the User-Agent header below. I have it set to Publisher, but a production service should have it set to an appropriate custom name.

use google\appengine\api\app_identity\AppIdentityService;

//Build JSON object to post to Pubsub

$message_data_string = base64_encode(json_encode($message_data));

$single_message_attributes = array ("key" => "iana.org/language_tag",
    "value" => "en",
);

$single_message = array ("attributes" => $single_message_attributes,
    "data" => $message_data_string,
);
$messages = array ("messages" => $single_message);

//Post to Pubsub

$url = 'https://pubsub.googleapis.com/v1/projects/NAMEOFGOOGLEPROJECT/topics/NAMEOFPUBSUB:publish';

$pubsub_data = json_encode($messages);

syslog(LOG_INFO, "Pubsub Message: " . $pubsub_data);

$access_token = AppIdentityService::getAccessToken('https://www.googleapis.com/auth/pubsub');

$headers = "accept: */*\r\n" .
    "Content-Type: text/json\r\n" .
    "User-Agent: Publisher\r\n" .
    "Authorization: OAuth " . $access_token['access_token'] . "\r\n" .
    "Custom-Header-Two: custom-value-2\r\n";

$context = [
    'http' => [
        'method' => 'POST',
        'header' => $headers,
        'content' => $pubsub_data,
    ]
];
$context = stream_context_create($context);
$result = file_get_contents($url, false, $context);

syslog(LOG_INFO, "Returning from PubSub: " . $result);

February 23, 2019February 24, 2019

Satellite Mode On Google Maps

Bret Taylor, a co-creator of Google Maps, recounted how Google Maps Satellite Mode was almost called Bird Mode. See a screenshot of the story below:

Read the HN discussion on this story here: https://news.ycombinator.com/item?id=19235017 .

February 22, 2019February 22, 2019

Google Doodle

Today’s Google Doodle celebrates the 57th birthday of “Crocodile Hunter” Steve Irwin, who was a famous wildlife conservationist, zookeeper, and TV personality.

This is what the Google homepage looked like with the doodle:

Clicking on it goes to a slideshow showcasing many aspects of Steve Irwin’s life.

Many other organizations are also taking the opportunity of celebrating Steve’s life, such as Animal Planet on Twitter:

February 21, 2019February 21, 2019

Serializing A Java Object To Google Cloud Storage – Java

A quick code example today: serializing a Java object to Google Cloud storage. write_object stands for the object being written. This code depends on the App Engine libraries for Java, and the Google Cloud Storage libraries.

	GcsService gcs_service = GcsServiceFactory.createGcsService();
	GcsFilename gcs_filename = new GcsFilename(AppIdentityServiceFactory.getAppIdentityService().getDefaultGcsBucketName(), "subfolder_within_bucket" + "/" + "filename.extension");
	GcsFileOptions gcs_options = GcsFileOptions.getDefaultInstance();
	GcsOutputChannel output = gcs_service.createOrReplace(gcs_filename, gcs_options);
	ObjectOutputStream oos = new ObjectOutputStream(Channels.newOutputStream(output));
	oos.writeObject(write_object);
	oos.flush();
	oos.close();
	output.close();

February 20, 2019

Python SQLite Table Creation Template

I often use the Python sqlite3 module: it helps save time during development as it’s a lightweight SQL engine. Even in production, some small applications can get away with running SQLite instead of a more normal SQL application.

To create a table in sqlite:

import sqlite3
def create_table():
    create_table_sql = """CREATE TABLE tweets (id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL UNIQUE,
     posted_date DATETIME, tweet_text VARCHAR(300),
     user VARCHAR(20), retweet_count int, favorite_count int,
     original_tweet_id VARCHAR(20)
     original_user VARCHAR(20));"""
    conn = sqlite3.connect("example.db")
    c = conn.cursor()
    c.execute(create_table_sql)
    conn.commit()
    conn.close()

And to execute operations against the created table, you simply need to connect to example.db and run c.execute:

# Execute into sqlite
conn = sqlite3.connect("example.db")
c = conn.cursor()