I’ve been looking at web-based sources of geographic data and Wikipedia links are something I’ve wanted to try out for a while. I found the following page containing Worldwide fossil sites:
This gives a list of sites, but with no locations:
The “Site” column contains href links which can be followed to pages like the following:
The coordinates can just be made out in the top right hand corner of the page.
As all Wikipedia pages follow a common theme, the coodinates are embedded in a <span> tag with class=”geo”. I already had a Java program for loading a web page and converting it into xhtml, so I used this to turn the original list page into a csv file by extracting the data out of the html tables. One of the columns in this file contained the links to the site-specific pages, so another program was written to follow all these links and extract the location from the site page.
While the general technique works, only about 25% of the data had links to pages with lat/lon coordinates, so the final map is somewhat incomplete, but this can be edited manually. The map itself was built using MapTube’s new map creation system which works using the CSV file of data. The final map can be viewed at the following link: