Dr. Georg Gottlob is a Professor of Informatics at Oxford University. From 2006 to 2011 he held the Chair of Computing Science at Oxford University. He then moved to the Chair of Informatics at Oxford, which has been holding since January 2012.
His current research interests are in the areas of web data extraction, constraint satisfaction, computational logic, data bases, data base theory, query languages, and complexity theory.
He is on the Editorial Board of the following Journals: Journal of Computer and System Sciences, Artificial Intelligence, Web Intelligence and Agent Systems (WIAS), Journal of Applied Logic, and Journal of Discrete Algorithms.
Extracting Big Data from the Deep Web: Technology, Research, and Business
Do you need to rent a new apartment fulfilling certain requirements? Or would you just like to find a restaurant in your area that serves pasta al pesto as today’s special? In either case, you would most likely start a web search, but keyword search as provided by current search engines is not really appropriate. The relevant data (at least for apartments) will reside in the Deep Web and requires forms to be automatically filled. Moreover, a keyword Web search does not allow you to pose complex queries. Solving this problem, at least for certain verticals such as real estate, used cars, restaurants, requires the extraction of massive data from heterogeneously structured websites of the Deep Web, and the storage of the data into a database having a uniform schema.
In this talk I will report about my 15 years long venture into Web data extraction. In particular, I will discuss the Lixto project we carried out at TU Wien, and the DIADEM ERC project we recently accomplished at Oxford. I will survey the tools and systems we constructed applications we carried out, and also some research results about the logical and theoretical foundations of Web data extraction we achieved. In addition, I will report about two start-ups we spun out
Gabriella Pasi is Full Professor at University of Milano-Bicocca, Department of Informatics, Systems and Communication (DISCo). Since March 1, 2015.
Since 2005 she leads the Information Retrieval Laboratory (IR LAB) at the Department of Informatics, Systems and Communication.
Her main research activities are related to Information Retrieval and Information Filtering. In recent years she has addressed the issues of contextual search and user modelling. She is also conducting research activities related to the analysis of user generated content on social media. She has published more than 200 papers on International Journals and Books, and on the Proceedings of International Conferences. She is involved in several activities for the evaluation of research; in particular, she was appointed as an expert of the Computer Science panel for the Starting Grants (till 2011), and Consolidator Grants (2012) of the Programme Ideas at the European Research Council. Since 2013 she is the President of the European Society for Fuzzy Logic and Technologies (EUSFLAT). She is a member of the Editorial Board of several international journals, and she has delivered several keynote talks/plenary lectures at international conferences related to her research interests. She has participated to the organization of several International events, in both roles of organization and program chair.
She is associate editor of two following Journals
- International Journal of Computational Intelligence Systems (IJCIS), Atlantis Press (since 2007).
- Journal of Intelligent and Fuzzy Systems (JIFS), IOS Press (since 2013)
For more information please visit Gabriella Pasi page.
The issue of Information Credibility on the Social Web
In the scenario of the Social Web, where a large amount of User Generated Content is diffused through Social Media often without any form of trusted external control, the risk of running into misinformation is not negligible. For this reason, assessing the credibility of both information objects and sources of information constitutes a fundamental issue for users. Credibility, also referred as believability, is a quality perceived by individuals, who are not always able to discern with their own cognitive capabilities genuine information from fake one. For this reason, in the last years several approaches have been proposed to automatically assess credibility in Social Media. Most of them are based on data-driven approaches, i.e., they employ machine learning techniques to identify misinformation, but recently also model-driven approaches are emerging. Data-driven approaches have proven to be effective in detecting false information, but in these approaches it is difficult to measure the contribution that each involved feature has in terms of credibility assessment. Furthermore, especially for supervised machine learning approaches, it is difficult to deal with real-life datasets labeled with respect to credibility, in particular when referring to opinion spam. Model-driven approaches aim at defining a predictive model based on an analysis of the problem and of the identified objects and their features; in particular, approaches relying on a Multi Criteria Decision Making paradigm constitute a way to compute an overall credibility assessment associated with a given information object (posts and blogs) by separately evaluating each feature connected to each alternative, and by subsequently aggregating the single assessments into an overall one. Several classes of aggregation operators can be employed to obtain the overall credibility estimate, thus modeling distinct behaviors of the considered process, corresponding to distinct predictive models. Furthermore, some aggregation operators allow model the interaction between criteria, as for example the Choquet integrals and copulas. In this lecture the impact of aggregation will be shown in the context of assessment of the credibility of user generated content. In particular, it will be shown that quantifier guided aggregation offers an interesting alternative to the application of machine learning techniques (in particular classifiers).
Frank van Harmelen is a professor in the Computer Science department at the Vrije Universiteit Amsterdam.
Since 2000, he has played a leading role in the development of the Semantic Web. He was co-PI on the first European Semantic Web project (OnToKnowledge, 1999), which laid the foundations for the Web Ontology Language OWL. OWL has become a worldwide standard, it is in wide commercial use, and has become the basis for an entire research community. He co-authored the Semantic Web Primer, the first academic textbook of the field . For more information, please visit professor Frank van Harmelen page.
Very (VERY) Large-Scale Knowledge Representation
In the past 15 years, the field of knowledge representation has seen a major breakthrough, resulting in distributed knowledge-bases containing billions of formal statements statements about hundreds of millions of objects, ranging from medicine to science, from politics to entertainment and from specialist to encyclopaedic.
As a consequence of this enormous increase in size, researchers in knowledge representation have been forced to reconsider many of the assumptions that were (often silently) made in traditional knowledge representation research: we can no longer assume that our knowledge-bases are consistent, we can no longer assume that they use a single homogeneous vocabulary, we can no longer assume they are static (or even that they evolve only slowly), we can no longer assume we can simply do logical deductions on the entire knowledge base, etc. How to define notions of local consistency? How to interpret conclusions if the axioms change even before the reasoning engine finishes? Can we exploit the network structure of knowledge-bases to finally define a useful notion of "context"?
In this talk we will discuss the challenges that are raised for modern research in knowledge representation now that KR finally has to face the real world, and can no longer rely on many of its previous comforting assumptions.