Skip to content
March 4, 2011 / jmeuropeana

update to page-generator – 2

I’ve updated the page generator I made using the Europeana API to be able to remove thumbnails that are not entirely relevant.
*** start of boring tech-talk ***
I have found a solution for the problem of deleting thumbnails that are not relevant or do not have a thumbnail. Unfortunately neither of these conditions can be checked for automatically. For relevance that is obvious. An example is serching for the author with the pseudonym ‘Hildebrand’. If I only search for his real name (Nicolas Beets), I find very little. If I broaden the search to include Hildebrand, I find quite a few people who are actually called Hildebrand. So to get the relevant results, I have to sift through the results and take only the good ones.
For the availability of a thumbnail the absence of an automated way of selection is a nuisance. It stems from the fact that the process Europeana uses to harvest the thumbnails is not always able to extract a thumbnail from the location that the data providers (the holders of the objects) give us. Sometimes there is no image at all, sometimes they refer us not to an image but to a webpage on which an image is displayed, sometimes they use a type of redirection that browsers can follow, but a simple image harvester cannot, sometimes the image is too small to create a thumbnail from. So in these cases we have no thumbnail, even though the data provider thinks they have provided us with a thumbnail location – and that is what is reflected in the metadata. So according to the metadata record there is a thumbnail, and in reality there is not. These objects have to be excluded from the generated pages manually.

I do this by creating an ASP page (not even ASP.NET; yes, I know plain ASP is last millenium’s technology – but then so is Java 🙂 ) with a form containing all the thumbnails, each with a checkbox. Unchecking the unwanted thumbnails and then hitting the ‘generate’ button will genereate a small html page with the links to the thumbnails. This html page is #included in the author page. Decoupling the styling from the list of thumbnails has the added advantage that I can change the layout and additional links on the pages without having to re-generate the pages. When I was just generating the pages this was not so much of an issue, but now that I introduced a non-automatic editing step, having to redo all that work would be unacceptable.

So the new workflow is:
– create a file with keywords and titles for pages
– generate shtml files, html link files and asp form files
– upload all generated files
– edit html link files using the selection form in the associated asp files
– selections are live directly, so the changes are made in place
– when you are happy with the results, remove the asp form files from the server.

If the tempate of the files changes, you would generate new files, then only upload the changed shtml files. The html snippets with the selected thumbnails remain unchanged.

Examples:
When searching for Flemish author ‘Hugo Claus’, I also find images of the Dutch royal family, because both names Hugo and Claus occur in that family. So removing them is a good idea.
original page for ‘Hugo Claus’
form to edit ‘Hugo Claus’ links. Note: this is a sample, the Generate button is not active, it just resets the form.
link snippet for ‘Hugo Claus’
new page embedding this link snippet for ‘Hugo Claus’ Note this is now a .shtml file, because of the need for the include.
*** end of boring tech-talk ***

What this means for you, dear reader, is better quality pages. For example the pages on the Dutch Literary Canon accessed from vraagbaak.brinkster.net/cnl/ned_auteurs.html are now more relevant, and contain less objects that do not have a thumbnail.

Good news: I have also noticed that the pages generated using the page generator are now indexed by Google. While the primary purpose of these pages is to create a meaningful and thematic reference to the stuff we hold in Europeana, each outside link to a Europeana object is useful in our SEO efforts.

Obligatory Europeana object:
As this post discusses the mechanism to choose various objects to include, I think this is appropriate: ‘The choice between virtue and passion’:
De keuze tussen deugd en hartstocht

Advertisements

2 Comments

Leave a Comment
  1. Dan Brickley (@danbri) / Mar 17 2012 18:13

    Are you still working on this? I’d love to be able to get an embeddable sheet of thumbnails, given an artist or movement name…

Trackbacks

  1. Work in progress: A Europeana Search API based image grid generator | Kadmeian Letters

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: