Archive for July, 2010

Making sense of data management ‘landscapes’

There are some fantastic developments in visualising data. From tag clouds to infographics to heatmaps to geospatial mashups to sparklines, finding new ways to understand and present data is essential in extracting value from the ‘data deluge‘, and solving the small, medium and grand challenges of our time. This excites me enormously.

It is, however, the domain of people who are cleverer than I (or at least much more adept at programming and using databases and analytics tools).

One of my major areas of work is on understanding and improving the whole of sector ‘landscape’ of data/information management in the environment sector. I’ve worked for the last eight years on strategy, policy and projects to help connect the many different datasets and system in this domain. This is in order to enable better access to knowledge generated from research, and better decision making and improved environmental management (including biodiversity, biosecurity, water and climate) by government agencies such as MfE, DOC, MAF Biosecurity, ERMA and the AHB, by local government agencies, and by NGOs and community groups.

By sharing data, and providing ‘middleware’ (such as the NZ Organisms Register) to connect data across different agencies, people have increased opportunity to develop and/or use tools to enhance the quality of the decisions they’re making, and the cost effectiveness of the limited resources we have for environmental management.

I’ve had a particular focus on information systems for biodiversity (the conservation of native species and ecosystems), but am now doing more work relating to information systems supporting biosecurity (preventing pest incursions and eradicating/managing existing pests).

Recently the Terrestrial and Freshwater Biodiversity Information Systems Programme (TFBIS) asked me to help determine where the gaps were in the biodiversity information systems ‘landscape’. I had written the TFBIS strategy in 2006/2007, and since then a number of systems have been developed to provide access to and connect existing datasets. The strategy helped give direction to approval of funding grants for such systems, but didn’t give a way of monitoring the progressive development of an interconnected and federated ‘meta-system’ for biodiversity management, or to understand which major pieces of ‘middleware’ needed to be developed next.

So, I made a biodiversity data landscape diagram. This shows the primary datasets, the sources of aggregated primary data, national middleware, web services, models & data transformation tools, interpretive tools, and user interfaces (for discovery, access, and data entry).

The diagram is a work in progress, and is very likely missing some items. If you know of anything that should be there but isn’t, please let me know. At the moment it’s a PDF, with many of the items linked to their web sites. In the future I’d like to create a more interactive version, hooked to a proper metadata repository.

It’d also be neat to see this approach used for other things like biosecurity, water, climate etc.

Biodiversity Data Landscape Diagram

See the key to the diagram for descriptions of the types of items and definitions for each of the ‘levels’.

The data deluge

Next week I’m facilitating the ‘Research Data Matters‘ workshop for The Ministry of Research, Science and Technology, National Library of New Zealand and the Royal Society of New Zealand. This is a one-day event to discuss issues surrounding the long-term management of publicly-funded research data.

I’ve been working on research data policy issues with MoRST for about seven years now and its exciting to see how far we’ve come in that time. One of my oft collaborators at MoRST last week asked me whether I’d seen any infographics that represented the ‘data deluge’, in particular the figures cited in the article by that name from the Joint Information Systems Committee (JISC) in the UK.

I’ve seen some excellent ones on the size of the Internet, and file storage volumes, but nothing of that nature, so I decided to make one. This uses physical objects to show the relative scale of moving from a megabyte up to an exabyte. Click the image for a larger version:

data deluge infographic

Apparently the current size of the Internet is estimated at 5 trillion terabytes, or 5 exabytes. I note the JISC article is from late 2004, so estimates on the total annual production of information may well have gone up by then.

For those particularly interested the actual sizes, they’re not precisely scaled by 1,000 each time, but are fairly close. Here are the numbers:

Length of a tiny ant 1.4 millimetres
Height of a short person 1.4 metres
Length of the Auckland Harbor Bridge 1,020 metres
Length of New Zealand 1,600 kilometres
Diameter of the Sun 1,390,000 km

This infographic is licensed by Julian Carver under the Creative Commons Attribution-ShareAlike 3.0 New Zealand License.

We need to start from the cold blooded premise that almost everyone is a genius - not that almost everyone is worthless.
John Taylor Gatto