Saturday, November 14, 2009

Week 10 Readings

Web Search Engines
To me, the most astounding thing about web search engine is how they are able to (with moderate success) replicate human qualities. As many of our readings in LIS 2000 demonstrated, subjects like relevance are largely based on human intuition. Search engines must duplicate the thought process we as librarians do to find sources that patrons need using only complex algorithms. By using numbers, they are able to make relevance quantifiable. It also interests me how the search engines use slightly features to optimize their searching. They have a "politeness" function that stops them from bogging down one particular crawler. They are constantly improving their ability to detect spam websites. They know what key word or phrase will give them the best results in their search. It is truly amazing how web searches are able to translate human concepts like relevance into the language of technology.

Current Developments and Future Trends for the OAI Protocol for Metadata Harvesting
The OAI Protocol seems to be at an interesting crossroad. From what I gathered, the OAI has the potential to be a umbrella metadata harvesting system for many diverse content management systems. However, a lot of its currently problems are due to the fact that these systems and their service providers are unique. The article mentions how repositories vary in the levels of completeness and how thorough their metadata is. Furthermore, different service providers have different standards and tagging methods for their systems. The article later states that the OAI community itself is "very loosely federated" and that "a more formal method of communication between data and service providers is needed." To me, it seems that the success of the OAI community hinges on whether or not time and makes these systems more compatible.

The Deep Web: Surfacing Hidden Value

This article's characterization of the deep Web surprised me on a number of levels. First of all, its massive size is unexpected. When we think of the Web, we think of something that is constantly changing due the ephemeral nature of websites. It is odd to think that such a large amount of information remains. Speaking of such things, I was also surprised when the website characterized the deep Web as relevant. Again, the ephemeral nature of websites has lead us to think of anything more than a month or even a week old as too old for the Internet.

Most interesting of all was how the deep Web illustrates the importance of metadata. Because the deep web is so massive, it cannot be browsed or tagged as easily as the surface web. Because of its lack of metadata, it is invisible to search engine crawlers, like it doesn't even exist. The article states that "serious information seekers can no longer avoid the importance or quality of deep Web information." It will be very interesting to see how they manage to bring the deep Web to the attention of search engines.

2 comments:

  1. see internous.com for the deep web standard. emerging just for you.

    ReplyDelete
  2. I find it interesting the point you made about search engines being humanlike - I suppose we put our own qualities even into the machinery we make...just like every old robot movie/story ever worries about!

    ReplyDelete