NoSQL Search Roadshow Zurich 2013

When: 5. Sep 2013 at 18:00 -

Abstract

GOTO Night with D. Milosevic & T. Wagner

Date Thursday, September 5th, 2013
Time 18:00 / 6PM
Venue Technopark
Address Technoparkstrasse 1, 8005 Zurich
Room Schulungsraum Newton 1009, 1. OG
Cost Free of charge

"Denormalizing Data Sets in Hadoop while Building Lucene Indexes" by Dragan Milosevic

Abstract
The invention of Hadoop made possible to cost-efficiently collect and analyze petabytes of data in order to extract valuable information that critically supports daily business of more and more companies. Collected data that supports zanox business comes from its tracking systems that generate billions of events on a daily basis, search-engines that provide cost information and master data about millions of advertisers and publishers. The efficient querying of extracted information is afterwards achieved by denormalizing data sets and storing the results in Lucene indexes. And one of basic operations to be performed while doing denormalization is concerned with joining huge data sets. While designing an optimal joining strategy for a given task, many lessons about strengths and weaknesses of different approaches are learnt and will be presented.

Download Slides [PDF]

Bio
Dr. Dragan Milosevic is certified Solr/Lucene, Hadoop and HBase developer and currently works as senior architect at zanox in distributed computing team that uses various open-source projects to create world-class reporting framework. He is author of a book "Beyond Centralized Search Engines: An Agent-Based Filtering Framework" that describes the application of various machine-learning techniques for solving cooperation and coordination challenges in distributed systems. Those principles are nicely integrated in zanox reporting framework that represents one very successful application of various Apache open-source projects. His talks on Hadoop Get Together 2010 and 2012, Buzzwords 2012, ApacheCon Europe 2012 and Lucene Revolution 2013 uncovered details about version-mismatch handling during communication, aggregation map-reduce jobs, resource-aware query routing and search analytics for guiding index-building.

"Physical design on Graph Databases - an example from InfiniteGraph" by Timo Wagner

Abstract
Database application design is often described as a combination of logical and physical design; logical design being the process of creating a logical schema that provides the persistence related functionality needed by the application, while physical design being the process of determining how that logical schema is to be represented in the database.
Typical physical design for applications built upon relational databases involve creating an internal schema (defining the tables), mapping the logical schema to the internal schema (classes to tables), and choosing various DB specific storage options for the internal schema components (tables and sometimes table fragments).
Because InfiniteGraph is a graph-database, typical physical design for applications built upon it is somewhat different. One main difference between InfiniteGraph and relational DBs is that in InfiniteGraph the logical and internal schemas are usually the same, thus there is usually no need to develop an internal schema, or to map between logical and internal schemas. Another significant difference is that the storage options that Objectivity provides are independent of the schema (logical or internal), which leads to greater flexibility (also known as physical independence).

Given the above, physical design in an InfiniteGraph application is mostly placement design and can be described as the process of determining where nodes or edges are placed into containers, where containers are placed into databases, and how container and database files are placed (distributed) in order to achieve optimal read, write, and concurrency performance. The Placement Manager (PM) feature dramatically improves ease of use by simplifying and streamlining what is often the hardest part of InfiniteGraph application development.

In this talk we will outline, why placement is important with regards to the performance of your graph application, how you can model certain placement strategies in InfiniteGraph and show performance improvements achieved in a customer application.

Bio
Timo Wagner currently works as Senior Technical Consultant EMEA for Objectivity, Inc. He has more than 10 years of experience in the SQL and NoSQL database field as a software architect and developer. He was part of the core development team that redesigned the Sones Graph/DB. With his experience in developing and managing large data sets he is a specialist in distributed graph databases, data quality and big data.

If you have any questions or have to cancel your registration, please don't hesitate to contact Louise Böhm.

Registration


We're sorry, registration is closed