« Internet Librarian: Search Engine Update | Main | As big as your head... »

October 26, 2005

Internet Librarian: Fueling Engines for the Future

Fueling Engines for the Future
DeWitt Clinton, A9
David Mandelbrot, Yahoo
Peter Norvig, Google

This panel of three representatives from these very huge and powerful companies was very interesting from the point of view of a regular user (that’s me!).

A9
DeWitt began by telling us that A9.com powers Amazon’s product search.  You log into A9 with your Amazon account.  Results appears in columns (vertical searches).  One of the verticals is “web” and one is “images.”  There are other verticals you can choose: books, movies, blogs, people, etc.  DeWitt noted that most search engine APIs, while proprietary, are very similar so A9 introduced OpenSearch.  OpenSearch is a proposal for a common format for search requests and results, identifies the minimal subset of data necessary for search syndication, and re-uses existing and familiar standards like RSS.  A9 offers about 300 OpenSearch Columns (white pages search, creative commons, Flickr, PubMed, etc.).  OpenSearch was launched in March 2005, has a new search engine added every day, and is Creative Commons Licensed [nice!].  Microsoft is building OpenSearch into the next version of Internet Explorer. 

Google
Peter [who I also saw speak at the California Library Association’s conference in 2004] showed us some of the newer Google features.  One is its direct answer service, where you type in a question, and you get the answer at the top of the results page.  He also mentioned what Greg did in the last session, the “See results for…” a similar search to your search, listed half-way down the results.  Google is also offering statistical machine translation for translating materials from one language to another.  They’re averaging 1-2 disfluencies [umm, is that a word?] per sentence.  He showed us Google Maps where you can drag things around and view satellite images.  He showed us some examples of people doing interesting things with Google APIs, including Placeopedia which links Wikipedia place articles to the locations on maps [this seems pretty cool, and is something I personally haven’t seen before]. 

Yahoo
David described some of the newer features from Yahoo.  Yahoo’s search ethics can be described as FUSE (Find, Use, Share, and Expand) human knowledge.  Yahoo is partnering with for-pay content providers and allowing users to personalize what results they get from for-pay providers (like the Wall Street Journal).  He also noted that they provide a search specifically for Creative Commons content (also mentioned in the last session).  Yahoo launched My Web 2.0, allowing user to save, tag, and annotate pages they’ve found useful and to share that with other users.  Searchers can then narrow their results to things that others in their communities have found useful.  Finally, he discussed Yahoo’s participation in the Open Content Alliance, a joint effort of international organizations to build an open and permanent digital archive (competing with Google Library).  They plan to offer the full text rather than snippets, like Google does.  It will be freely crawlable and include both multimedia and text content. 

Questions from the audience
• Someone who works for NIH asked what these 3 companies are doing to include publicly funded research (open access scholarly literature) in their engines.  Peter noted Google Scholar, David noted that Yahoo has partnered with NIH to get live feeds from their database for their search engine, and DeWitt stated that A9 is hoping that content providers like the NIH will use syndication formats like OpenSearch to allow clients anywhere to access their content).
• Someone asked about copyrighted scholarly material and how to create access to these materials.  Greg responded that it’s a bigger problem than search engines can address.  David responded that a lot of publicly available material seems to be available only in licensed for-pay databases, and that is a problem.  He also added that the Open Content Alliance will start making data available this year, completely for free, and that’s one small step in the right direction.
• Someone asked the panelists if any of their new initiatives have made something they had in the past seem less relevant.  DeWitt stated that OpenSeach will probably overturn many things that search engines have right now.  David noted that Yahoo’s Directory is being made irrelevant by people’s tendency to search not browse.  Peter responded that the tabs at the top of the page are becoming less relevant (sorting by image, etc.) are not visible to many searchers, and that something that looks like an image query will go straight to the image tab instead of making the user click on the tab.
• Gary Price noted that the same day Yahoo announced Yahoo Subscriptions, Gale announced a similar service (Access My Library) that would put Gale content into Yahoo databases and make access for library users available with a log-in.  Peter noted that Google is trying to get that kind of information as well.  David mentioned Yahoo’s partnership with OCLC as well with Open WorldCat.
• Someone asked about the satellite data in Google, and what the latency period is as some of the data seems to be a couple of years old.  Peter admitted it is spotty, and that coverage of metropolitan areas is better.  DeWitt noted that A9 is covering new cities all the time and refreshing older stuff.
• Someone asked whether it would be difficult to mark the satellites with the dates the images are from.  Peter replied that this is certainly a good idea. 

Technorati tags: ,

October 26, 2005 | Permalink

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/t/trackback/12541/3456468

Listed below are links to weblogs that reference Internet Librarian: Fueling Engines for the Future:

Comments

Post a comment

*Please only submit your comment once. Comments are moderated due to spam problems. I have to approve the comment before it will show up. I will try to do it quickly.*
LiB's simple ground rules for comments:
1. No personal attacks, rude, or intolerant comments.
2. Comments need to actually relate to the blog post topic.