Thumbnail API

Website Thumbnail API

http://abcnews.go.com http://accuweather.com http://addthis.com http://adobe.com http://allrecipes.com

Thumbnail API is closing down soon!

August 2014:
The service provider has announced that "The server appears to have been inappropriately using system resources". Thus, it seems like this service will be closed soon. The "Service is closing soon" advertisement should be in place of all new thumbnail requests, which looks like:

Perhaps this 2GB/RAM server was getting too old.



Introduction to the free website thumbnail API

We provide simple interfaces to get website thumbnails (some call them thumbshots, or website screenshots) on your web pages. Please read the instructions carefully, so that you get the comprehensive understanding on how to use the service. In its simples form, just replace the red portion with the URL you wish, include the line in your web page, put the page online, and you're done! Of course you can always test the service offline (page containing the link not being on the net, but perhaps on your PC)

<img src="http://immediatenet.com/t/m?Size=1024x768&URL=http://immediatenet.com/"/>

Open source or not?

As of now, Tue 29 Nov 2011, most of the source code were put public right here - immediatenet.com open sourced!
It's "open sourced" because this is my hobby, but I haven't had much time into this lately. Another reason is that the service usage is hitting the ceiling many times a day (yes, it generates thousands and yet thousands new thumbnails per day, but occasionaly many clients generate them exactly at the same moment).

Preconditions

The service is meant to be such as we're stating here. It's not meant to be such a service that enables you to get thumbnails of websites in a different manner as we're proposing you. Purposeful misuse is easily detected - and action may be taken on such abuse (including IP address banning at firewall). Humans make mistakes - and mistakes are not a problem at all. Update on March 2012: Go ahead and download any generated images on your local machines - but in that case you should (no must, you don't have to) make a backlink to immediatenet.com Please read and understand the site terms here.

Format

The API provides you a .JPG image out of the requested URL. The JPG image has the quality level of 90/100. The JPG is scaled directly from the virtual browser screen (for example, 1024x768 pixels) - so there is no additional layers to degrade the quality. In error cases, there should be a JPG containg the error message.

Engine - queue-less, of course!

Our engine is based on parallel processing with the most modern technology out there. Unlike many other thumbnail providers, we do not have such concepts as 'queues'. Everything is handled in a true multi-processing environment. The only limitation is the amount of open connections the web server (Apache) itself allows - as well as how loaded the server is. Moreover, the process is lightweight as we do not invalidate (and thus cause exessive processing) on the virtual screens any more than what's required. The engine is based on Firefox 4.01. the WebKit. Interested to read more about technical details? Click here!

Cache policy

We wish to maintain up-to-date thumbnails. However, creating the same thumbnail every time a user hits your web page is far from being optimal. We want to cache all entries - so that an already present thumbnail will be retrieved from the cache instead of being generated from the very beginning. Our current cache policy is such that all entries are stored for the duration of approximately 8 days. After that, the entry is invalidated and it will be deleted from the cache. If it is then accessed again, it will be generated again. Thus, the thumbnail should be fairly up-to-date - or 8 days old at most. The policy may be subject to change to what we feel is the best. (Policy changed on Sep 05 from 5 days to 8 days). The cached thumbnails are not directly accessible by any other means than through this service. Some full sized images are taken into the 1000 pages -directory once in a while.

Items are fetched from the cache through what I call "direct indexing". It just means that even if the server had 20-500 million thumbnails, the thumbnails are still served immediately (about the same speed as a static image is served by Apache).

Delays

Our server is located in the UK. There is just one at the moment. There may be delays in the service, due to following reasons:

1. Where your pages / services / target audience is located physically
2. If the thumbnail has not been found from the cache, it may take time to render it depending on how much time it takes to retrieve the page ( + network latency) and how loaded the server is. Our thumbnail generator engine is fast - but some pages may be slow to load!
3. Some other delays related to, but not limited to, service maintenance. Scheduled service maintenance takes place at most once a month, for the duration of at most 15 minutes (rebooting the server after security updates), but usually just 3 minutes.

If your target audience is in the UK (or closeby - in western Europe), there is no better way to get thumbnails on your site! Even if your target audience is somewhere else - we should still be the fastest and most reliable service out there! However, due to strong growth, we may be setting up similar services throughout the world. So stay tuned and read the news right here!

Misses

Some pages may not necessarily look fine. For example, we do not support the secure https:// protocol at all. Furthermore, some pages may have start-up animations or unusual redirections, which may effect the look of the page usually making them look completely blank. Always make sure the thumbnails look good on your page.

Error scenarios

Our practise is to provide a jpg image out of the service - no matter what. If there was an error, the engine should return a jpg containing a string that expresses the error in more details. Usually an error is present because you might have not obeyed the service parameters (mispelled / typoed). If the server is busy creating new thumbnails (yes, we cannot handle unlimited amout of requests, and these days the server generates 30000+ thumbnails daily), and the one you're requesting is not found from the cache (needs to be created but there's no CPU power available at the monent), it will just drop you a message like seen below:

server busy scenario

One common mistake appears to be the mishandling of escaped http links. Notice that the API does NOT unescape strings - which may lead to a wrong location! It's your task to do proper unescaping when necessary (convert the %xx values to ascii characters). We do not unescape, because it would require extra processing. However, if there's need for such a service, please let me know (find the email address far below at Feedback / Suggestions).

Persistent errors with the WebKit

Some pages make the WebKit rendering engine go stuck. That was not happening with the Firefox. Thus, all accesses are given a certain timeout, and if there's no content, a blank page is created and cached. Let's hope the WebKit gets better on the way.


http://nypost.com http://nytimes.com http://okcupid.com http://optmd.com http://orbitz.com

Service

Everything, except the URL source, is case sensitive. You can play around and see how the thumbnails look like in different sizes etc. right here. Check out how the service performs with 50 website thumbnails by clicking on the image below:

Top 50 sites in USA

The service API is located at 4 different locations for different sizes; s for small, m for medium, l for large, fs for full size:

<img src="http://immediatenet.com/t/s?
<img src="http://immediatenet.com/t/m?
<img src="http://immediatenet.com/t/l?
<img src="http://immediatenet.com/t/fs?

After the "service location" you append the service parameters separated by the ampersand "&" sign. The parameters are "Size=" and "URL=". The Size is the size of the virtual browser window in pixels. URL is the www address of the page. To pass the HTML validator tests, you should escape the ampersand sign by replacing it with "&amp;". The order of the parameters must be obeyed as well as the case sensitivity.

The services above do not wait any - but instead, draws the thumbnails as soon as all data has been received. Sometimes the Mozilla engine isn't all ready, so a portion of a page may look incomplete. Thus, on Fri, 29 Jul 2011, new APIs were introduced that contain 3 second delays (look at the tables below). Yes, it takes 3 seconds more to generate a thumbnail - but it is then stored to the cache. This way the thumbnails look more complete - if quality is of concern. Well, have a look at about 200 thumbnails that do NOT have any delays (click on the image below):

USA pages annotated

Note the difference; a number 3 is assigned right before the question mark:

<img src="http://immediatenet.com/t/s3?
<img src="http://immediatenet.com/t/m3?
<img src="http://immediatenet.com/t/l3?

So how to use the service? For example, to get a web page "http://immediatenet.com/3d_web_pages.html" at Size 1024x768 scaled to 20% of its original size, you could have:

<img src="http://immediatenet.com/t/l?Size=1024x768&URL=immediatenet.com/3d_web_pages.html" />

Notice that actually it should be done as declared below to make your page pass validator tests (all other examples should be converted in this format also, but for clarity, they're shown in more readable manner):

<img src="http://immediatenet.com/t/l?Size=1024x768&amp;URL=immediatenet.com/3d_web_pages.html" />

And the resulting thumbnail would look like:

Immediate 3D Pages

For example, to get a web page "http://immediatenet.com/" at Size 1024x768 scaled to 15% of its original size, you could have:

<img src="http://immediatenet.com/t/m?Size=1024x768&URL=http://immediatenet.com/" />

The resulting thumbnail would look like:

Immediate Internet Services

For example, to get a web page "http://immediatenet.com/thumbnail.html" at Size 1024x768 scaled to 10% of its original size, you could have

<img src="http://immediatenet.com/t/s?Size=1024x768&URL=http://immediatenet.com/thumbnail.html" />

The resulting image would look like:

Immediate Internet Thumbnails

For example, to get a web page "http://immediatenet.com/" at Size 1024x1024 at full size, you could have

<img src="http://immediatenet.com/t/fs?Size=1024x1024&URL=http://immediatenet.com/" />

The resulting image would look like:

Immediate Internet Thumbnails

Please store full sized images on your servers. It's subject to a different cache policy, and full size images are stored only for a couple of days at most.


We also have an annotating service. The domain name of the link is automatically annotated at the bottom of the thumbnail. It works only at medium size (scaled to 15% of the original) and automatically annotates only the domain URL. If the domain name is very long, the annotated text goes "out of bounds" - the image is streched, but the website thumbnail itself stays at its size. For example, to get a web page "http://immediatenet.com/thumbnail.html" at Size 1024x768 scaled to 15% of its original size, annotated automatically; you could have:

<img src="http://immediatenet.com/t/t_a?Size=1024x768&URL=http://immediatenet.com/thumbnail.html" />

The resulting image would look like:

Immediate Internet Thumbnails

Service Parameters

Valid combinations for "Size:"
1024x768
1024x1024
1280x768
1280x1024
800x600

Dedicated services

If you have a lot of traffic and are concerned about the quality of service, you're welcome to discuss with us on getting your own, dedicated services. If you wish to get your own server or family of servers, you should contact us with the feedback form here - please include the keywords "DEDICATED SERVICE". Hopefully we'll find a deal for you - but just to make sure, that is not going to be free.

Roadmap - what's to come

1. WebKit version (in addition to Mozilla Firefox one)
2. Various thumbnail enhancement services
3. Secondary cache ("L2", before invalidating/deleting, we move the thumbnail to the secondary cache. Then, if a new request on the same URL fails (refreshing doesn't go fine due to network break etc.) we just bring the entry from the "L2" cache.)
4. Force drawing - sometimes a page cannot be loaded, so it results in a "Target URL Timeout" message. The plan is to draw those pages anyway, if something has been loaded in the 30 second (timeout) period - even incomplete pages. This may happen if the page doesn't load in 30 seconds (well possible if it's located physically very far from UK and with a very slow connection).
5. wget API - to get thumbnails of arbitrary sizes and types (all for free, yes)

Feedback / Suggestions

Feel free to suggest us new features or just give feedback here. If you want something different; image sizes, cache policies, referer/cookie restrictions etc - please give me a short reason why and say what's your project - and you'll likely get the features. Just send me an e-mail: eero.nurkkala [at] offcode.fi

News

I had a vacation for a week. During this time, perhaps in March 7 - 10, the system went broke. It's been running alone for a year or so with all its built-in scripts that should detect all erroneous behaviour and act accordingly, but when you leave it alone it starts misbehaving =). Please send me email directly if such odds occur again.

20 Aug 2013: Ubuntu updates applied..caused some reboots and slowdown.

Sept 11, 2012: Improvements: blank page detection + autoincrement watch = pages as baidu.com look like they should. The page is blank very initially, but after a second or so, the contents are shown. Gnome-web-photo fails to detect such behavior automatically. Added detection of hung gnome-web-photo processes. The process is started with a certain timeout, so even if the gnome-web-photo gets stuck - even its timer won't respond nor timeout - the process is killed, and a blank page is generated for you. Enjoy the WebKit!

The webkit version is now taken into use. On Sept. 04, 2012, I updated to Ubuntu 12.04 from Ubuntu 10.04. The gtk backend was no longer supported by the firefox, so I thought it's time to switch to the WebKit. Thus, all the latest and the greatest will be supported (CSS3 etc). On Sept 05 2012 everyting seems ok: adobe flash works on the webkit as well. I patched the gnome-web-photo, the patches will be availble later.

'Final' release was applied on Sat, 03 Sep. 'Final' means that there will possibly be no more major enhancements because 99.% of the pages on the internet look just fine! Next one will be the WebKit release - in parallel to the Mozilla Firefox one.

On July 2011, Firefox 5.0 was trialed out. However, it looks like there will be no Firefox 5.0 based thumbnail API, because a lot of features were taken away. We would have to backport many dropped Firefox / Mozilla engine features ourself. That isn't out of question, but we'll need to see if we've got time for that - or will we just switch into WebKit version directly.

At the summer of 2012, a Webkit based engine will be introduced. That is a work in progress.

07 July 2011, significant improvements were carried out on the quality of some (rare, but 1/100 or so) pages that appeared blank before. One should no longer experience such phenomenon, unless the target URL goes into a severe (hostile) redirection loop.

Page last updated:
30 Mar 2012: Updated the user agent string from:
pref("general.useragent.override", "Mozilla/5.0 (X11; Linux x86_64; rv:2.0.1) Gecko/20100101 Firefox/4.0.1");
into
pref("general.useragent.override", "Mozilla/5.0 (X11; Linux x86_64; rv:2.0.1; +http://immediatenet.com) Gecko/20100101 Firefox/4.0.1");

08 Mar 2012: Server side scripts have enhanced a lot now, should there be an X server problem, the system will bring everything up at pace no one will notice.
18 Jan, 2012: Yesterday I needed to call for the first time for the server to be rebooted byt the fasthost.co.uk. Bites me! There was a significant downtime - for 12 hours or so! It's known as what caused the hickup, and there's been enhancements on the server side "janitor" scripts. Now there's a detector that will deny the accesss to the X server immediately when the X dies - otherwise the system "burns in flames".
Nov 29, 2011: ISP DNS servers are down, slows down thumbnail generation. Great. Hope they'll fix this soon!
Wed, 05 Oct 2011 - maintenance break (which was so far the longest)
Thu, 29 Sep 2011 (Added full size image support)
Fri, 02 Sep (New release: "Final" - fixed near blank thumbnails appearing at one swedish domain. Release is being tested only on small sized thumbnails)
Sun, 28 Aug (Installed security updates & rebooted the system)
Fri, 19 Aug (Added Open source topic)
Wed, 17 Aug (Experienced a slow-down at the server for 15 minutes due to undetected recursion. All recursion attempts are now handled properly)
Mon, 15 Aug (Removed Proxies -topic. There's no restrictions anymore, so no need to detect Blue Coat proxies)
Thu, 11 Aug (Cache policy change: from 3 days to 5 days)
Thu, 11 Aug 2011 06:02:55 +0100: Service break! Took total of 3 minutes (installed security updates & rebooted the device).
Wed, 03 Aug (New release: "Near Perfection")
Mon, 01 Aug (Removed 'reference' -restriction. Also removed "offline" service as it's redundant now (/t/m does the same!)
Sun, 31 Jul 2011 (Removed cookie references at "Preconditions" -topic, as they were not valid at all)
Fri, 29 Jul 2011 (Introduced new services (s3, m3 and l3), modified the look of the page)
Thu, 28 Jul 2011 (Updated Roadmap, Proxies and Error scenarios -topics)




Click on the images below for more immediatenet.com services:

Immediatenet.com home Thumbnails HTML to image directory 3D animations