Most Underrated API? The Yahoo! Term Extractor

There’s a million APIs out there, and I couldn’t be happier. It’s easy now to translate street addresses to lat/long coordinates. It’s easy to grab local results, and overlay them on a map. It’s easy to use Yahoo or Google to get all types of search results (local, images, etc), and sites like Amazon to get prices and products.

But I think one of the coolest and most underrated APIs is the Term Extractor API from Yahoo!:

In other words, you point it at a piece of content — a news article, blog post, movie review or whatever — and it returns a list of terms, or keywords (or “tags” for those of you keeping score at home).

What do you do next with a list of keywords from a piece of content? Well, lots of things. Jeremy Keith wrote yesterday about a few ideas (that seem up for grabs, if you’re in a hacking mood!).

What if you treated each returned term as a tag? You could then pass those tags to any number of tag-based services, like Flickr, Del.icio.us, or Technorati.

So, instead of the simple “here’s my Technorati profile” or “here are my Flickr pics” on a blog, you could have links that were specific to each individual blog post. If I sent the text of this post to the term extractor, it would return a list of terms like “api”, “yahoo”, etc. By passing those terms as tags to a service like Technorati or Del.icio.us, readers could be pointed to other blog posts and articles that are (probably) related.

Like he suggests, it gets interesting when you let the output from this web service be the input for another service. I was lucky enough a few months ago to lend a small bit of help to the team that brought you the Yahoo! Events Browser mashup. One challenge of that product was to get images associated with each event. If you’ve ever worked with unstructured data — event listings are super unstructured — then you know that they don’t provide many high-quality hooks for understanding their content. The team tried doing image searches on venue or artist name, but the results weren’t very relevant or interesting, even when the parsed venue or artist was accurate. So, being the put-lots-of-pieces-together types there are, they decided to use the Term Extractor to discover more accurate, meaningful, and specific query terms to then find images for. Here’s how they summed it up:

To display appropriate images for events, local event output was sent into the Term Extraction API, then the term vector was given to the Image Search API. The results are often incredibly accurate.

I’ve only seen a handful of implementations of the Term Extractor API so far. If you’ve got a cool one to point me to, or a cool idea for a future implementation, please leave ‘em in the comments below.