Jump to Navigation

Big Data Analysis Interview with Bill Thompson

Big Data Analysis Interview with Bill Thompson, head of the Partnership Development Archive BBC is now available!


According to Bill Thompson, the term Big Data as a general label is to be viewed sceptically, as to his mind there is nothing fundamentally new in computer science terms.However, he does agree that there are certain technologies that should be invested in. Especially the EU should invest from a public service point of view to counteract large companies that will focus purely on areas for profit. 
He also provided two UK-related analogies that should be avoided:  Firstly, UK schools having to suffer in computer science education because MS Office was adopted and secondly big pharma not investing in cures for Malaria.
Furthermore, he thinks that it is very important that the EU invests from a public service point of view to counteract the big companies that will focus purely on areas for profit. He gave 2 analogies that we might want to avoid: UK schools having to suffer in computer science education because MS Office was adopted; and big pharma not investing in cures for Malaria. 
Bill Thompson, Head of the Partnership Development within the BBC Archives Development group, is an English technology journalist, commentator and writer, best known for his weekly column in the Technology section of BBC News Online and his appearances on Click, a radio show on the BBC World Service


Big Data Analysis Interview with Peter Mika, Senior Scientist at Yahoo! Research Labs in Barcelona

New Big Data analysis interview with Peter Mika is out:

Within the standard interface

As an MP3 audio file

As a small video-only window

As a large video window with ability to jump to a specific segment

The main theme for Peter was on using a machine learning, information extraction and semantic web technologies to reduce Big Data into more manageable chunks and how combined with new programming paradigms such as Hadoop we could now accomplish more. Background knowledge (in a simple form) enables Yahoo1 to understand that "Brad Pitt Fight Club" is a search for a movie with Brad Pitt playing the role of disambiguation.

Peter Mika is a researcher working on the topic of semantic search at Yahoo Research in Barcelona, Spain. Peter also serves as a Data Architect for Yahoo Search, advising the team on issues related to knowledge representation. Peter is a frequent speaker at events, a regular contributor to the Semantic Web Gang podcast series and a blogger at tripletalk.wordpress.com [1].

Yet another way to define Big Data

What is Big Data - Is 1 Petabyte considered as Big Data? May be 10 Petabyte?

According to Wikipedia: "big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications ".  Ok,  but is Hadoop a traditional data processing application or not?   You know, it's almost 5 years around…

According to Gartner:  "Big data is high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making." But then again: what is high-volume and high-velocity? Is 1 Petabyte data set considered as high-volume information asset or not?

Some will say that the size of Big Data is a moving target; others will claim that Big Data is the data size that doesn't fit into one computer memory. And though last two examples are capturing the fluid nature of Big Data they are not sufficiently formal enough.

From the very beginning of computer industry, data sets always grew bigger and bigger and IT departments were always concerned with insufficient resources to support this growth. So what has changed now to become “Big Data” buzz?

It seems that what makes Big Data to be Big Data is not some size threshold and not some velocity threshold but rather the ratio. What ratio? The ratio between data volume, data velocity and hardware available. Should we expect 1000 times faster CPU in next year and 1000 time bigger and faster memory in next year, would we care about 40-60% annual growth rate in data set size? Probably not. So the ratio is the ratio between data growth rate and hardware growth rate. Specifically, CPU performances, storage and memory capacity and storage and memory speed. Assuming this definition, and taking into account hardware growth rate in last 10 years (which is slower than in the 90s) and data growth rate which was very high in the last 10 years, it becomes much clear why 15-20 years ago we didn't hear about Big Data and now we do. So, may be instead of saying Big Data we should say Small Hardware?  Or may be both?


Big Data Analysis Interview with Jeni Tennison, Technical Director of the Open Data Institute

Jeni Tennison

Big data analysis interview with Jeni Tennison, Technical Director of the Open Data Institute, interviewed by John Domingue is out:



Within the standard interface

As an MP3 audio file

As a small video-only window

As a large video window

Jeni discussed how open data can be found and combined to serve decision making. A key technology of interest, pointed out by Jeni, was discovery of datasets that are distributed in the internet and tools that allow achieving this in  an automated way.

Within the wider UK public sector, Jeni Tennison worked on the early linked data work on data.gov.uk, helping to engineer new standards for the publication of statistics as linked data; building APIs for geographic, transport and education data; and supporting the publication of public sector organograms as open data. She continues her work within the UK's public sector as a member of both the UK Government Linked Data Group and the Open Data User Group [1].

Big Data Analysis Interview with Prasanna Lal Das, Lead Program Officer Controllers World Bank

Big Data Analysis Interview with Prasanna Lal Das, Lead Program Officer (Knowledge & Innovation) at The World Bank is now available online:

The interview covered how he sees Big Data can help tackle poverty and proactively address corruption by support decision making based on real-time data. On behalf of  Prasanna we would like to stress out that the opinion provided in the interview and Prasanna himself should not be regarded as an expert in poverty or anti-corruption measures.

Prasanna Lal Das  is senior program officer, Office of the Controller, World Bank. He is a content strategist and KM practitioner with experience in journalism, computer games design, and management consulting. 

Big Data Analysis Interview with Usman Haque, Pachube Founder and Director Urban Projects Division COSM

Usman Haque

Check out our big data analysis interview with Usman Haque, Pachube Founder and Director Urban Projects Division COSM:

Within the standard interface

As an MP3 audio file

As a small video-only window

As a large video window

Usman mostly covered a community oriented view to Big Data Acquisition which he says is very important if citizens and communities are to fully engage with important issues in the world. Key here is the fact that the community can overcome any deficiencies (errors or heterogeneities) by creating their own specific tools.

Usman Haque has worked a lot with interactive environments over the years, founded a web platform for building internet-connected devices, buildings and environments for storing, sharing and discovering of real time sensor, energy and environmental data, known as Pachube, acquired by LogMeln in 2011. Later on, Usman took part in launching COSM.com platform where he was heading up urban projects that dealt with data, sensors and internet of things.

Big Data Analysis Interview with Alon Halevy Research Scientist at Google

Alon Halevy

Big Data Analysis Interview with Alon Halevy Research Scientist at Google is now online:

As an MP3 audio file

As a small video-only window

As a large video with ability to jump to a specific segment

Standard interface

In the interview Alon drew upon his work on Google Fusion Tables which allows users to upload and store their datasets. A collection of technologies which are not necessarily new but are now beginning to work at scale are having an impact. These include: reconciling entities (saying X and Y are the same thing), resolving schema and ontology differences, extracting high quality data from the web, large ontology based datasets (typically built from wikipedia), crowd sourcing computation. 

Alon Halevy leads the Structured Data research team at Google Research. Prior to that, Alon was a professor of Computer Science at the University of Washington, where he started  UW CSE Database Group in 1998, and worked in the field of data integration and web data.

BIG-member Exalead @ Big Data Paris Trade Show 2013 - April, 3rd and 4th

BIG-Consortium Partner EXALEAD will be present on the Big Data Paris Trade Show, which will take place on 3/4 April 2013. EXALEAD is an official sponsor and will have a booth at the conference.

The  Big Data Paris Trade Show opens its doors again in 2013 in CNIT Paris La Défense. In this conference, theory-based and technological presentations will be accompanied by lots of trade feedback from the Big Data pioneers Innovative projects will be presented, both in plenary sessions and project workshops, linking concrete examples to theory, economic issues to more technological ones.

Another highlight of the event is an exhibition, which will give all those involved in the world of Big Data an opportunity to present their solutions and to meet their partners and clients. Additionally, this year’s trade show visitors will have access to a series of product workshops, where some of the suppliers present at the exhibition will present their products.

The detailed program of the conference and additional information on the venue, plenary sessions and workshops can be found here: http://www.bigdataparis.com/2013-uk-programme.php

The registration form is available here: http://www.bigdataparis.com/uk-registration.php

Big Data Analysis Interview with Hjalmar Gislason

Hjalmar Gislason is the founder and CEO of DataMarket.com. In this Interview Hjalmar Gislason covers the area of Data Visualization and Data Modelling via semantics. He believes the simplicity of use to be crucial to success and that lot of technologies like the Semantic Web Stack are over engineered. According to him there is a high demand for "democratization of semantic technologies" - making everything accessible through a web browser and dealing with legacy versions of IE.

The Flashmeeting is now online at: http://fm.ea-tel.eu/fm/fmm.php?pwd=467502-32874 and the MP3 Audio file is available at: http://fm.ea-tel.eu/flashmedia/flashmeeting/fm32874_467502-32874/mp3/aud...

DataMarket helps business users find and understand data, and data providers to efficiently publish their data and reach new audiences. DataMarket.com provides access to thousands of data sets holding hundreds of millions of facts and figures from a wide range of public and private data providers including the United Nations, the World Bank, Eurostat and the Economist Intelligence Unit. The data portal allows this data to be searched, visualized, compared and downloaded in a single place in a standard, unified manner. 

Big Data Analysis Interview with Andraz Tori, Founder and CTO of Zemanta

Andraz Tori

In this interview Andraz mainly covers Hadoop framework, explains why it was successful and provides interesting remarks on why the US seems to do better than Europe in Big Data technologies at the moment.

The Flashmeeting is available from here: http://fm.ea-tel.eu/fm/fmm.php?pwd=8d9a4d-32917 (note: there is a difference between the volumes in the voices, however this can be adjusted simply by clicking on our names in the bottom panel)

Or check the MP3 Audio file from:  http://fm.ea-tel.eu/flashmedia/flashmeeting/fm32917_8d9a4d-32917/mp3/audio.mp3

Andraz Tori is a CTO and co-founder of Zemanta, a 5-years old company dealing with semantic analysis of text for the purpose of having a personal writing assistant and general purpose recommendations. In terms of Big Data Andraz characterizes Zemanta as a “small data” inside Big Data. The company operates in terabytes of compressed data, running CPU intensive operations. 



Cialis sales are available on many trusted Internet sites. In humans, cialis has no effect on bleeding time when taken alone or with aspirin.

Subscribe to

Main menu 2

by Dr. Radut