Jump to Navigation

Data Storage

BIG Final Event Workshop

Programme of the BIG Final Event Workshop co­located with ISC Big Data in Heidelberg 


The Big Project

Welcome and Introduction Nuria De Lama (ATOS Spain)  


Key Technology Trends for Big Data in Europe 

Edward Curry (Research Fellow at Insight @ NUI Galway) 



The Big Data Public Private Partnership 

Nuria De Lama (ATOS Spain)



Panel discussion about a common Big Data Stakeholder Platform 

Martin Strohbach (AGT International)  

● The PPP Stakeholder Platform

(Nuria De Lama

● Hardware and Network for Big Data

(Ernestina Menasalvas, RETHINK BIG EU Project) 

● Tackling BIG DATA Externalities

(Kush Wadhwa, Trilateral Research BYTE EU Project) 

● The value of the Stakeholder platform

(Sebnem Rusitschka, Siemens, BIG and BYTE Project) 


Networking and Break-out Sessions


update will follow



D2.2.2 Final Version of Technical White Paper available

The final version of the technical whitepaper of deliverable 2.2.2 is available now. It details the results from the Data Value Chain Technical Working groups describing the state of the art in each part of the chain together with emerging technological trends for exploiting Big Data. It is an amalgamation of the results of the challenges in regards to big data in different sectors and working groups. The Data Value Chain identifies activities in Data Acquisition, Data Analysis, Data Curation, Data Storage and Data Usage.
A member of the BIG Health forum comments: "We interviewed experts in the biomedical domain to ask for their opinion about the current situation in data acquisition and data quality. We identified challenges that need to be addressed for establishing the basis for BIG health data applications. Additionally, the current data quality challenges were diagnosed and are reported in the deliverable."

The BIG project at Big Data World Congress, Munich, 3-4 December 2013

The BIG project had a strong presence at BIG Data World Congress in Munich in early December. There was a strategically-positioned stand  in the exhibition hall. We met a number of delegates from many industrial sectors and countries, especially in the “speed dating” session where we perfected the BIG project’s elevator pitch in the quick-fire conversations! Project flyers and stickers were available in many places for people who wanted to learn about the project after the conference. The two day event was closed by a presentation from the BIG project’s director Josema Cavanillas, introducing the aims of the project and the outputs of our research.
The event featured case studies and panels on every aspect of Big Data technologies including governance, unstructured data, real-time analytics and much more. Attendees came from a wide range of organisations, including some big players in sectors such as manufacturing and telecoms. One exciting potential avenue of collaboration may be for BIG to work with the USA’s NIST (National Institute of Standards and Technology) as they are also developing cross-sector consensus requirements and roadmaps for Big Data.
Many speakers talked about how adopting Big Data could revolutionise the ways businesses operate, driving efficiency and faster product development. It is recognised by most if not all senior level executives as one of the key IT trends of the next few years - but this comes with the caveat that Big Data initiatives need to be aligned to clear outcomes and business processes in order to have a chance of success. The structure of organisations may need to be adapted to enable technical and business expertise to work together more closely to enable value to be derived from data. Even then, the pace of industry change may be such that organisations will look to form partnerships with start-ups and universities so as to drive innovation. The BIG project’s Public Private Forum could be a key enabler for these communities.
Europe-specific issues were highlighted in several talks. There was criticism of the apparent risk aversion of technology companies and their customers and the lack of a widespread start-up culture (apart from a few isolated exemplars). There are differences between Europe and the US in terms of data protection, the EU’s tougher legislation possibly being a barrier to innovation for some firms (on the other hand, the US’s relatively lax laws may have implications for privacy and the ethics of extensive data collection by businesses).

Interview with Andreas Ribbrock Team Lead Big Data Analytics and Senior Architect at Teradata GmbH

Big Data Analysis Interview with Andreas Ribbrock, Team Lead Big Data Analytics and Senior Architect at Teradata GmbH, is online now:

In his interview Andreas talked about three classes of technologies required for Big Data: storage (advocating distributed file systems as a competitive way to handle these); query frameworks which can translate from user queries to a set of different query engines (calling it a 'discovery platform'); and a platform which can handle the delivery of the right results to the right personnel in the right time frame.

Andreas also stressed that integration is key as Big Data can not be solved by any single technology but requires a suite of technologies to be tightly integrated. In general, any architecture/framework for Big Data must be open and adaptable as new technologies/components are plugged in. Fabric computing where components are virtualized and allow data flow at high speeds was a possible approach to solve this.

In terms of impact two key drivers are the ability for Big Data to allow companies to personalise their communication with clients and also how user communication channels will change. On the one hand, one can integrate channels for energy consumption, phone use, banking. On the other, users may prefer their own channels (which produce a lot of data) and impose these on enterprises in specific markets. e.g. traditional banks may soon become obsolete as their functionality is taken by PayPal (a TeraData customer), Amazon and Google.

He ended the interview with the phrase: Big Data is Big Fun!

BIG at LSWT2013 - From Big Data to Smart Data - A Summary

The 5th Leipziger Semantic Web Tag (LSWT2013) was organized as a meeting point for german as well as international Linked Data experts.
Under the motto: From Big Data to Smart Data sophisticated methods that enable handling large amounts of data have been presented on September 23th in Leipzig.
The keynote was held by Hans Uszkoreit, scientific director at the German Research Center for Artificial Intelligence (DFKI). By being introduced  to Text Analytics and Big Data issues the participants of the LSWT 2013 discussed the intelligent usage of huge amounts of data in the web.
Presentations on industrial and scientific solutions showed working solutions to big data concerns. Companies like Empolis, Brox and Ontos presented Linked Data and Semantic Web solutions capable of handling terabytes of data. However, also traditional approaches, like Datameer’s Data Analytics Solution based on Hadoop pointed out that big data could be handled nowadays without bigger problems.
Furthermore, problems detecting topics in massive data streams (Topic/S), document collections (WisARD) or corpora at information service providers (Wolters Kluwer) were tackled. Even the ethical issue of robots replacing journalists by the help of semantic data has been examined by Alexander Siebert from Retresco.
In conclusion, the analysis of textual information in large amounts of data is an interesting and so far not yet fully solved area of work. Further Information are available from the website.
 Further information on topics related to data analysis, data curation, data storage, data acquisition and data usage can be found in our technical whitepaper available from our project website.

Big Data Analysis Interview with Steve Harris Chief Technology Officer at Garlik, an Experian Company


Check new Big Data Analysis Interview with Steve Harris, Chief Technology Officer at Garlik, an Experian Company available in the following formats:

The company that Steve is associated with has as its main focus the prediction and detection of financial fraud through the use of their customised RDF store and SPARQL. They harvest several terabytes of raw data from chat-rooms and forums associated with hackers and generate around 1B RDF triples based on this. In terms of areas that need work Steve's suggestion was the optimisation of the the performance of these stores. We also discussed the need to make sure that the infrastructure was economically viable and that training of staff to use RDF/SPARQL was not a big issue.

Steve Harris is a lead design and development of a multi million user product in the financial services industry at Garlik, Experian Company. In the Semantic Web community, he is widely regarded as the architect of Garlik's open source, scalable RDF platform, 5store, and has served on the World Wide Web Consortium (W3C) working groups that defined the SPARQL query [1].

Big Data Analysis Interview with Ricardo Baeza-Yates VP of Research for Europe and Latin America at Yahoo!

Ricardo Baeza-Yates

Check out new Big Data analysis interview with Ricardo Baeza-Yates, VP of Research for Europe and Latin America at Yahoo!



The main suggested themes to invest in by Ricardo are:

a) what he called Hadoop++ the ability to handle graphs with trillions of edges as MapReduce doesn't scale well for graphs; and b) stream data mining - the ability to handle streams of large volumes of data. Handling lots of data in a 'reasonable' amount of time is key for Ricardo - for example, being able to carry out offline computations within a week rather than a year.

Additional point of interest of Ricardo was personalisation and its relation to privacy. Rather than personalising based on user data we should personalise around user tasks. More details in the interview!

Ricardo Baeza-Yates is VP of Research for Europe and Latin America, leading the Yahoo! Research labs at Barcelona, Spain and Santiago, Chile, and also supervising the lab in Haifa, Israel. Until 2005 he was the director of the Center for Web Research at the Department of Computer Science of the Engineering School  of the University of Chile; and ICREA Professor and founder of the Web Research Group at the Dept. of Information and Communication Technologies of Universitat Pompeu Fabra in Barcelona, Spain [1].

Big Data Analysis Interview with Peter Mika, Senior Scientist at Yahoo! Research Labs in Barcelona

New Big Data analysis interview with Peter Mika is out:

Within the standard interface

As an MP3 audio file

As a small video-only window

As a large video window with ability to jump to a specific segment

The main theme for Peter was on using a machine learning, information extraction and semantic web technologies to reduce Big Data into more manageable chunks and how combined with new programming paradigms such as Hadoop we could now accomplish more. Background knowledge (in a simple form) enables Yahoo1 to understand that "Brad Pitt Fight Club" is a search for a movie with Brad Pitt playing the role of disambiguation.

Peter Mika is a researcher working on the topic of semantic search at Yahoo Research in Barcelona, Spain. Peter also serves as a Data Architect for Yahoo Search, advising the team on issues related to knowledge representation. Peter is a frequent speaker at events, a regular contributor to the Semantic Web Gang podcast series and a blogger at tripletalk.wordpress.com [1].

Yet another way to define Big Data

What is Big Data - Is 1 Petabyte considered as Big Data? May be 10 Petabyte?

According to Wikipedia: "big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications ".  Ok,  but is Hadoop a traditional data processing application or not?   You know, it's almost 5 years around…

According to Gartner:  "Big data is high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making." But then again: what is high-volume and high-velocity? Is 1 Petabyte data set considered as high-volume information asset or not?

Some will say that the size of Big Data is a moving target; others will claim that Big Data is the data size that doesn't fit into one computer memory. And though last two examples are capturing the fluid nature of Big Data they are not sufficiently formal enough.

From the very beginning of computer industry, data sets always grew bigger and bigger and IT departments were always concerned with insufficient resources to support this growth. So what has changed now to become “Big Data” buzz?

It seems that what makes Big Data to be Big Data is not some size threshold and not some velocity threshold but rather the ratio. What ratio? The ratio between data volume, data velocity and hardware available. Should we expect 1000 times faster CPU in next year and 1000 time bigger and faster memory in next year, would we care about 40-60% annual growth rate in data set size? Probably not. So the ratio is the ratio between data growth rate and hardware growth rate. Specifically, CPU performances, storage and memory capacity and storage and memory speed. Assuming this definition, and taking into account hardware growth rate in last 10 years (which is slower than in the 90s) and data growth rate which was very high in the last 10 years, it becomes much clear why 15-20 years ago we didn't hear about Big Data and now we do. So, may be instead of saying Big Data we should say Small Hardware?  Or may be both?


Big Data Analysis Interview with Alon Halevy Research Scientist at Google

Alon Halevy

Big Data Analysis Interview with Alon Halevy Research Scientist at Google is now online:

As an MP3 audio file

As a small video-only window

As a large video with ability to jump to a specific segment

Standard interface

In the interview Alon drew upon his work on Google Fusion Tables which allows users to upload and store their datasets. A collection of technologies which are not necessarily new but are now beginning to work at scale are having an impact. These include: reconciling entities (saying X and Y are the same thing), resolving schema and ontology differences, extracting high quality data from the web, large ontology based datasets (typically built from wikipedia), crowd sourcing computation. 

Alon Halevy leads the Structured Data research team at Google Research. Prior to that, Alon was a professor of Computer Science at the University of Washington, where he started  UW CSE Database Group in 1998, and worked in the field of data integration and web data.

Cialis sales are available on many trusted Internet sites. In humans, cialis has no effect on bleeding time when taken alone or with aspirin.

Subscribe to Data Storage

Main menu 2

by Dr. Radut