ICWR07-07

Information Discovery on the WWW, Using Search by image

کد کارگاه: ICWR07-07

زمان: پنجشنبه 31 فروردین 96 ساعت 16:30  الی 18:30

مکان: تهران

حیطه موضوع:

ارائه دهندگان: Paul Nieuwenhuysen - استاد تمام دانشگاه Vrije ، بلژیک  (ارائه به زبان انگلیسی است)

خلاصه کارگاه آموزشی:

Workshop Abstract:

Search by image on the WWW is a relatively new method for information retrieval, in which a search query consists not of text, but of an image file. Other names for this method include ‘reverse image searching’ and ‘reverse image lookup’. A query can also include both text and an image. The search results lead to related images and also to related documents on the WWW. This method of searching can be applied to cope with several types of information needs, for which more classical search methods fail or perform less efficiently. An obvious application is the discovery of duplicate images and images that have some elements in common with the image in the search query. Furthermore, the technology is improved steadily towards discovery of images that are not only visually related, but also semantically related; in parallel, this can yield information about the contents of the image used in the query. Combining text and an image in one search query can yield more precise results than more simple, queries.

Workshop Contents:

Introduction:

This tutorial workshop is based on a continuing investigation of the power, applicability, and usefulness of search by image through the Internet.

Background and purpose:

In this relatively new method for information retrieval, a query does not consist of text, but of an image file. The search results lead to images on the WWW and also to related documents. Other names for this method are

- Search(ing) by example

- Reverse image search(ing)

- Reverse image lookup = RIL

- Backwards image search(ing)

- Inside search(ing)

- Content-based information retrieval = CBIR

Furthermore, a search query can also consist of a combination of an image with text.

Topics:

- Several online services are available free of charge to search by image.

- Differences among these services are substantial.

 - The search service offered by Google performs relatively well.

- Google can reveal images present on the Internet, which are duplicates of the query/source image; however, the success is quite variable from case to case.

- This recall performance is strongly correlated with the performance of a more classical Google search by text to find copies of the query/source image file on the Internet.

- Even images that are modified versions of the query/source image can be revealed by Google; more specifically, such modified versions can differ from the source image in size and in colours; the method can even reveal a fragment or edited fragment of the source image that is include in some image on the internet.

- My tests have demonstrated that since 2014 search by image can not only find images that are visually similar to the query/source image, but can even retrieve images that are semantically similar/related to the query/source image, even when visual similarity is not obvious. The search results may also include a description of the subject on the image, and this can of course be interesting if the user has not yet much knowledge about the subject, so that using a specific text query becomes possible. Furthermore, other information related to the image and relevant links may also be included in the search results.

- The performance of search by image to find images that are semantically similar to the query/source image is improving.

- Not only pure, simple search either with words or with a source image is possible by the freely available search system offered by Google, but a search query can also consist of a combination of an image with words. This allows us to combine the strengths of more classical text retrieval with the more recent search by image. My tests have shown that this allows us to obtain search results with a precision that is higher than when only one of both search methods is used.

Various applications can be shown:

- Starting from an image that you created or that is affiliated with your organization, you may find copies/duplicates or even modified/edited versions on the WWW. This can reveal copyright infringements. In a more positive way, this allows to assess the impact of such images on a worldwide audience. For example: curators or owners of a collection of objects can assess the impact and reuse of photos of the physical objects in their collection, on a worldwide scale.

- Starting from some interesting image that you have not created, but that you consider as interesting, and that is perhaps not the original version and for which the creator/author is not indicated, you may find other and better versions that are more suitable for your application and need; also you may find the author(s) on the WWW, which can be useful to obtain more information or to discuss possible copyright linked to the image.

- Also searching by image may allow us to discover that the image that illustrates and supports a document is NOT real / authentic, but that is has been copied from another site, from another context and perhaps that it has even been modified / changed / doctored, to support the text, the claims of the author of the document.

- Starting from some interesting source image, you may find semantically related images; in other words, you may discover images with a subject that is related to the subject of that source image.

- Furthermore, including some text in the search by image query may increase the precision of the results, even when not enough knowledge is available in advance, so that only one or a few unspecific search words can be used.

- Consider the scenario in which you have already sufficient information/knowledge in advance to formulate and submit a specific, focused text search query; even then, including an image to that search query can be useful to increase the precision of the results.

Furthermore, in each of these applications, you may also find related text information.

Conclusion:

Search by image is evolving to a powerful, additional method to tackle information needs that are difficult to handle with more classical methods. Furthermore, using a combination of text and an image in a query can increase the precision of the search results in comparison with a more classical pure text search or with a simple, pure search by image.

Recommendations for practitioners:

The growing successes of the search methods that include an image in the query to find relevant information lead me to a few recommendations:

  1. To find relevant information, these recent, additional search methods should be considered besides more classical methods, by librarians and information intermediaries in general, and also by end-users of information discovery systems.
  2. As a consequence, search by image deserves a place in educational courses and tutorials on information and media literacy.
  3. Authors and publishers in general want to create their publications and make these available in such a way that they rank high in the results of relevant search and discovery systems. Therefore it is good practice to take into account the workings of at least the classical, popular, search services, in the creation and optimization of their website(s). Not only the texts in a website should be considered, but also images, to optimize
    --for a relatively classical search with a text query to find images,
    --for a more recent search by image, or
    --for a search with a query that consists of text plus an image file.
    More concretely, website developers should try to publish their meaningful images in such a way that these can be well harvested, analyzed and included in the database index of relevant search systems in an effective way.

Workshop Objectives:

Participants learn about state-of-the-art applications and limitations of reverse image search on the internet and WWW. So they will be motivated and enabled to apply this relatively new method to discover information and to support other potential users.