Tips for finding web page modification and/or creation dates

September 26, 2012

The following are some ways find or at least get an idea of the modification and/or creation date of a given web page:

1.) First and most obvious: look for some dates on the web page.
You can get creative with searching within the page using your browser’s search capabilities to find this.  For instance, search for “2012”

2.) Get the date of the last time Google indexed the page.
This date will be between the day of and approximately 3 months of the “last modified” date depending on where the page falls in Google priority for indexing. Google can get to most of the pages it will add to its index every three months at least. The more popular the page is the closer this date will be to the actual “last modified” date. Google doesn’t actually update this date unless the content of the page changed. This is about as good as you can get for dynamically generated web pages if you are looking at the web page as a whole.

The best way to do this is to enter this in your browser’s address bar and look at the date by the result.
https://www.google.com/search?q=inurl:<web address>&as_qdr=y15&safe=active

Here is an example:
https://www.google.com/search?q=inurl:https://codeinchinese.wordpress.com&as_qdr=y15&safe=active

3.) Look at the HTTP headers.
For static web pages or static pieces of web pages you can sometimes get the “last modified” and if you are lucky the “creation” dates. If you want this information for the website as a whole you can go to http://www.statscrop.com, enter the url of the page you are intested in, and scroll down to the “HTTP Header Analysis” section. This will show you the header for the main response to the url request.

Sometimes this information isn’t provided in the HTTP response header for that particular request or the page has content that is dynamic enough to make it useless (ie. it always returns today). In that case you can look at other resource requests. The average page now days is actually made up of dozens of resources that are all acquired by the browser with separate request. The web server’s response to each of these requests will contain a response header. You can see these other headers by opening the Chrome developer tools, clicking on the network tab, and putting the page’s URL in the browser’s address bar. Inside the Network tab there is another tab called Headers. You are interested in the Reponse headers. Try to select the resource on the left side that matches the content you are interested in. It may be a picture, CSS file, PHP, HTML file etc, …

Note, not all web servers are configured to send “creation” or “last modified” date in the HTTP response header. Don’t be surpised if you don’t see these fields.

4.) Other services
http://www.cubestat.com provides a service that shows some additional web page indexing information. I’m not entirely sure where it gets its data. I expected the dates to match what Google shows but that isn’t always so. It looks at multiple indexers (Google, Yahoo, Live (Bing)). It could be getting the data from Alexa, Quantcast, or MagesticSEO. Or it could be running its own crawler but that is doubtful.

5.) DNS registration
Another interesting piece of information is when the domain name was registered to a given ip address or owner. http://www.statscrop.com will also give you this information.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: