Jump to Navigation

Data Curation

BIG at LSWT2013 - From Big Data to Smart Data - A Summary

The 5th Leipziger Semantic Web Tag (LSWT2013) was organized as a meeting point for german as well as international Linked Data experts.
Under the motto: From Big Data to Smart Data sophisticated methods that enable handling large amounts of data have been presented on September 23th in Leipzig.
The keynote was held by Hans Uszkoreit, scientific director at the German Research Center for Artificial Intelligence (DFKI). By being introduced  to Text Analytics and Big Data issues the participants of the LSWT 2013 discussed the intelligent usage of huge amounts of data in the web.
 
Presentations on industrial and scientific solutions showed working solutions to big data concerns. Companies like Empolis, Brox and Ontos presented Linked Data and Semantic Web solutions capable of handling terabytes of data. However, also traditional approaches, like Datameer’s Data Analytics Solution based on Hadoop pointed out that big data could be handled nowadays without bigger problems.
 
Furthermore, problems detecting topics in massive data streams (Topic/S), document collections (WisARD) or corpora at information service providers (Wolters Kluwer) were tackled. Even the ethical issue of robots replacing journalists by the help of semantic data has been examined by Alexander Siebert from Retresco.
 
In conclusion, the analysis of textual information in large amounts of data is an interesting and so far not yet fully solved area of work. Further Information are available from the website.
 
 Further information on topics related to data analysis, data curation, data storage, data acquisition and data usage can be found in our technical whitepaper available from our project website.

Big Data Analysis Interview with Steve Harris Chief Technology Officer at Garlik, an Experian Company

steve_harris

Check new Big Data Analysis Interview with Steve Harris, Chief Technology Officer at Garlik, an Experian Company available in the following formats:

The company that Steve is associated with has as its main focus the prediction and detection of financial fraud through the use of their customised RDF store and SPARQL. They harvest several terabytes of raw data from chat-rooms and forums associated with hackers and generate around 1B RDF triples based on this. In terms of areas that need work Steve's suggestion was the optimisation of the the performance of these stores. We also discussed the need to make sure that the infrastructure was economically viable and that training of staff to use RDF/SPARQL was not a big issue.

Steve Harris is a lead design and development of a multi million user product in the financial services industry at Garlik, Experian Company. In the Semantic Web community, he is widely regarded as the architect of Garlik's open source, scalable RDF platform, 5store, and has served on the World Wide Web Consortium (W3C) working groups that defined the SPARQL query [1].

Big Data Analysis Interview with Ricardo Baeza-Yates VP of Research for Europe and Latin America at Yahoo!

Ricardo Baeza-Yates

Check out new Big Data analysis interview with Ricardo Baeza-Yates, VP of Research for Europe and Latin America at Yahoo!

 

 

The main suggested themes to invest in by Ricardo are:

a) what he called Hadoop++ the ability to handle graphs with trillions of edges as MapReduce doesn't scale well for graphs; and b) stream data mining - the ability to handle streams of large volumes of data. Handling lots of data in a 'reasonable' amount of time is key for Ricardo - for example, being able to carry out offline computations within a week rather than a year.

Additional point of interest of Ricardo was personalisation and its relation to privacy. Rather than personalising based on user data we should personalise around user tasks. More details in the interview!

Ricardo Baeza-Yates is VP of Research for Europe and Latin America, leading the Yahoo! Research labs at Barcelona, Spain and Santiago, Chile, and also supervising the lab in Haifa, Israel. Until 2005 he was the director of the Center for Web Research at the Department of Computer Science of the Engineering School  of the University of Chile; and ICREA Professor and founder of the Web Research Group at the Dept. of Information and Communication Technologies of Universitat Pompeu Fabra in Barcelona, Spain [1].

Big Data Analysis Interview with Peter Mika, Senior Scientist at Yahoo! Research Labs in Barcelona

New Big Data analysis interview with Peter Mika is out:

Within the standard interface

As an MP3 audio file

As a small video-only window

As a large video window with ability to jump to a specific segment

The main theme for Peter was on using a machine learning, information extraction and semantic web technologies to reduce Big Data into more manageable chunks and how combined with new programming paradigms such as Hadoop we could now accomplish more. Background knowledge (in a simple form) enables Yahoo1 to understand that "Brad Pitt Fight Club" is a search for a movie with Brad Pitt playing the role of disambiguation.

Peter Mika is a researcher working on the topic of semantic search at Yahoo Research in Barcelona, Spain. Peter also serves as a Data Architect for Yahoo Search, advising the team on issues related to knowledge representation. Peter is a frequent speaker at events, a regular contributor to the Semantic Web Gang podcast series and a blogger at tripletalk.wordpress.com [1].

Big Data Analysis Interview with Alon Halevy Research Scientist at Google

Alon Halevy

Big Data Analysis Interview with Alon Halevy Research Scientist at Google is now online:

As an MP3 audio file

As a small video-only window

As a large video with ability to jump to a specific segment

Standard interface

In the interview Alon drew upon his work on Google Fusion Tables which allows users to upload and store their datasets. A collection of technologies which are not necessarily new but are now beginning to work at scale are having an impact. These include: reconciling entities (saying X and Y are the same thing), resolving schema and ontology differences, extracting high quality data from the web, large ontology based datasets (typically built from wikipedia), crowd sourcing computation. 

Alon Halevy leads the Structured Data research team at Google Research. Prior to that, Alon was a professor of Computer Science at the University of Washington, where he started  UW CSE Database Group in 1998, and worked in the field of data integration and web data.

Pages

Cialis sales are available on many trusted Internet sites. In humans, cialis has no effect on bleeding time when taken alone or with aspirin.

Subscribe to Data Curation


Main menu 2

by Dr. Radut