Big Data, Open Data and Data Development
By Jean-Louis Monino and Soraya Sedkaoui
()
About this ebook
The world has become digital and technological advances have multiplied circuits with access to data, their processing and their diffusion. New technologies have now reached a certain maturity. Data are available to everyone, anywhere on the planet. The number of Internet users in 2014 was 2.9 billion or 41% of the world population. The need for knowledge is becoming apparent in order to understand this multitude of data. We must educate, inform and train the masses. The development of related technologies, such as the advent of the Internet, social networks, "cloud-computing" (digital factories), has increased the available volumes of data. Currently, each individual creates, consumes, uses digital information: more than 3.4 million e-mails are sent worldwide every second, or 107,000 billion annually with 14,600 e-mails per year per person, but more than 70% are spam. Billions of pieces of content are shared on social networks such as Facebook, more than 2.46 million every minute. We spend more than 4.8 hours a day on the Internet using a computer, and 2.1 hours using a mobile. Data, this new ethereal manna from heaven, is produced in real time. It comes in a continuous stream from a multitude of sources which are generally heterogeneous.
This accumulation of data of all types (audio, video, files, photos, etc.) generates new activities, the aim of which is to analyze this enormous mass of information. It is then necessary to adapt and try new approaches, new methods, new knowledge and new ways of working, resulting in new properties and new challenges since SEO logic must be created and implemented. At company level, this mass of data is difficult to manage. Its interpretation is primarily a challenge. This impacts those who are there to "manipulate" the mass and requires a specific infrastructure for creation, storage, processing, analysis and recovery. The biggest challenge lies in "the valuing of data" available in quantity, diversity and access speed.
Related to Big Data, Open Data and Data Development
Related ebooks
The Innovation Biosphere: Planet and Brains in the Digital Era Rating: 0 out of 5 stars0 ratingsDigital Asset Ecosystems: Rethinking crowds and clouds Rating: 0 out of 5 stars0 ratingsThe Digital Factory for Knowledge: Production and Validation of Scientific Results Rating: 0 out of 5 stars0 ratingsRecommender Systems Rating: 0 out of 5 stars0 ratingsTrends, Discovery, and People in the Digital Age Rating: 0 out of 5 stars0 ratingsPlatform Socialism: How to Reclaim our Digital Future from Big Tech Rating: 0 out of 5 stars0 ratingsSmart Urban Mobility: Transport Planning in the Age of Big Data and Digital Twins Rating: 5 out of 5 stars5/5Digital Libraries and Innovation Rating: 3 out of 5 stars3/5Computer Science and Ambient Intelligence Rating: 0 out of 5 stars0 ratingsMedia Pluralism and Online News: The Consequences of Automated Curation for Society Rating: 0 out of 5 stars0 ratingsA Critical History of Poverty Finance: Colonial Roots and Neoliberal Failures Rating: 0 out of 5 stars0 ratingsWeb 2.0 and Libraries: Impacts, Technologies and Trends Rating: 4 out of 5 stars4/5Enterprise Knowledge Capital Rating: 0 out of 5 stars0 ratingsThe Algorithmic Code of Ethics: Ethics at the Bedside of the Digital Revolution Rating: 0 out of 5 stars0 ratingsStrategic Intelligence for the Future 2: A New Information Function Approach Rating: 0 out of 5 stars0 ratingsUntangling Smart Cities: From Utopian Dreams to Innovation Systems for a Technology-Enabled Urban Sustainability Rating: 0 out of 5 stars0 ratingsNew Challenges for Knowledge: Digital Dynamics to Access and Sharing Rating: 0 out of 5 stars0 ratingsInformation Wants to Be Shared Rating: 0 out of 5 stars0 ratingsInnovation Engineering: The Power of Intangible Networks Rating: 0 out of 5 stars0 ratingsAudience Evolution: New Technologies and the Transformation of Media Audiences Rating: 3 out of 5 stars3/5Social Networks in China Rating: 0 out of 5 stars0 ratingsThe Information Process: A Model and Hierarchy Rating: 0 out of 5 stars0 ratingsStrategy in the Digital Age: Mastering Digital Transformation Rating: 0 out of 5 stars0 ratingsComparable Corpora and Computer-assisted Translation Rating: 0 out of 5 stars0 ratingsData Analytics and Big Data Rating: 0 out of 5 stars0 ratingsGrounded Innovation: Strategies for Creating Digital Products Rating: 3 out of 5 stars3/5Mobile Technology for Children: Designing for Interaction and Learning Rating: 1 out of 5 stars1/5Smart Cities and Artificial Intelligence: Convergent Systems for Planning, Design, and Operations Rating: 5 out of 5 stars5/5Electronic Exchanges: The Global Transformation from Pits to Bits Rating: 0 out of 5 stars0 ratings
Power Resources For You
Electronics All-in-One For Dummies Rating: 4 out of 5 stars4/5The Boy Who Harnessed the Wind: Creating Currents of Electricity and Hope Rating: 4 out of 5 stars4/5Solar Power Demystified: The Beginners Guide To Solar Power, Energy Independence And Lower Bills Rating: 5 out of 5 stars5/5Solar Electricity Basics: Powering Your Home or Office with Solar Energy Rating: 5 out of 5 stars5/5Energy: A Beginner's Guide Rating: 4 out of 5 stars4/5Emergency Preparedness and Off-Grid Communication Rating: 0 out of 5 stars0 ratingsThe Grid: The Fraying Wires Between Americans and Our Energy Future Rating: 4 out of 5 stars4/5Solar Power Your Home For Dummies Rating: 4 out of 5 stars4/5The Homeowner's DIY Guide to Electrical Wiring Rating: 5 out of 5 stars5/5Oil: A Beginner's Guide Rating: 4 out of 5 stars4/5The Ultimate Solar Power Design Guide Less Theory More Practice Rating: 4 out of 5 stars4/5Idaho Falls: The Untold Story of America's First Nuclear Accident Rating: 4 out of 5 stars4/5The Way Home: Tales from a life without technology Rating: 4 out of 5 stars4/5Electric Motors and Drives: Fundamentals, Types and Applications Rating: 5 out of 5 stars5/5World Film Locations: Las Vegas Rating: 0 out of 5 stars0 ratingsDIY Lithium Battery Rating: 3 out of 5 stars3/5How Do Electric Motors Work? Physics Books for Kids | Children's Physics Books Rating: 0 out of 5 stars0 ratingsPower Supply Projects: A Collection of Innovative and Practical Design Projects Rating: 3 out of 5 stars3/5Photovoltaic Design and Installation For Dummies Rating: 5 out of 5 stars5/5Operational Amplifier Circuits: Analysis and Design Rating: 5 out of 5 stars5/5Station Blackout: Inside the Fukushima Nuclear Disaster and Recovery Rating: 0 out of 5 stars0 ratingsElectric Power Transmission: Lecture Notes of Electric Power Transmission Course Rating: 5 out of 5 stars5/5Demystifying Switching Power Supplies Rating: 5 out of 5 stars5/5Electric Motor Control: DC, AC, and BLDC Motors Rating: 5 out of 5 stars5/5Off Grid And Mobile Solar Power For Everyone: Your Smart Solar Guide Rating: 0 out of 5 stars0 ratingsSolar Power: How to Construct (and Use) the 45W Harbor Freight Solar Kit Rating: 5 out of 5 stars5/5Shorting the Grid: The Hidden Fragility of Our Electric Grid Rating: 0 out of 5 stars0 ratingsGeo Power: Stay Warm, Keep Cool and Save Money with Geothermal Heating & Cooling Rating: 5 out of 5 stars5/5OFF-GRID PROJECTS: A Comprehensive Beginner's Guide to Learn All about OffGrid Living from A-Z and Live a Life of Self-Sufficiency Rating: 0 out of 5 stars0 ratingsTemporary Stages II: Critically Oriented Drama Education Rating: 0 out of 5 stars0 ratings
Reviews for Big Data, Open Data and Data Development
0 ratings0 reviews
Book preview
Big Data, Open Data and Data Development - Jean-Louis Monino
Table of Contents
Cover
Title
Copyright
Acknowledgements
Foreword
Key Concepts
Introduction
I.1. The power of data
I.2. The rise of buzzwords related to data
(Big, Open, Viz)
I.3. Developing a culture of openness and data sharing
1 The Big Data Revolution
1.1. Understanding the Big Data universe
1.2. What changes have occurred in data analysis?
1.3. From Big Data to Smart Data: making data warehouses intelligent
1.4. High-quality information extraction and the emergence of a new profession: data scientists
1.5. Conclusion
2 Open Data: A New Challenge
2.1. Why Open Data?
2.2. A universe of open and reusable data
2.3. Open Data and the Big Data universe
2.4. Data development and reuse
2.5. Conclusion
3 Data Development Mechanisms
3.1. How do we develop data?
3.2. Data governance: a key factor for data valorization
3.3. CI: protection and valuation of digital assets
3.4. Techniques of data analysis: data mining/text mining
3.5. Conclusion
4 Creating Value from Data Processing
4.1. Transforming the mass of data into innovation opportunities
4.2. Creation of value and analysis of open databases
4.3. Value creation of business assets in web data
4.4. Transformation of data into information or DataViz
4.5. Conclusion
Conclusion
Bibliography
Index
End User License Agreement
List of Tables
1 The Big Data Revolution
Table 1.1. Data units of measurement
2 Open Data: A New Challenge
Table 2.1. Open data in five stages.
4 Creating Value from Data Processing
Table 4.1. The 50 most innovative companies in 2014
List of Illustrations
Introduction
Figure I.1. Relationship between data, information and knowledge [MON 06]
Figure I.2. The hierarchic model: data, information, and knowledge [MON 06]⁸
Figure I.3. Web searches on Big Data
and Open Data
2010–13 according to Google Trends. For a color version of the figure, see www.iste.co.uk/monino/data.zip
Example I.1. The startup E-PROSPECTS
Example I.2. Information processing C2i certificate security and massive processing⁵ by QRCode⁶
Example I.3. Data mining and Statistica software
Example I.4. An example from France’s Bouches-du-Rhône Administrative Department and from the city of Montpellier
Example I.5. Netflix
Example I.6. INSEE and sectorization
Example I.7. A startup population tracking app
Example I.8. Open Data in the city of Rennes
Example I.9. Data Publica and C-RADAR
1 The Big Data Revolution
Figure 1.1. Diversity of data sources
Figure 1.2. The importance of data scientists.
Example 1.1. A sales receipts analysis by Wal-Mart
Example 1.2. Book suggestions for Amazon customers
Example 1.3. An ecosystem provided by Nike
Example 1.4. The development of storage capacities
Example 1.5. Two examples of open source data
2 Open Data: A New Challenge
Figure 2.1. Open Data: history
Figure 2.2. Open Data platform growth in France.
Example 2.1. Data journalism
Example 2.2. Open Data and governance
3 Data Development Mechanisms
Figure 3.1. A model of economic intelligence [MON 12]
Example 3.1. Data centres or digital factories
Example 3.2. An example of data processing for the startup 123PRESTA
in 2010. For a color version of the figure, see www.iste.co.uk/monino/data.zip
Example 3.3. Forecasting of time series using neural networks, the turnover of large retail stores. For a color version of the figure, see www.iste.co.uk/monino/data.zip
Example 3.4. Chaos, exponents of Hurst and Bootstrap. An example applied to the Paris Stock Exchange [MAT 05]
Example 3.5. Short videos presenting CI
Example 3.6. A base of e-prospects with Statistica as an example of data processing
4 Creating Value from Data Processing
Figure 4.1. Companies actively targeting Big Data in their innovation programs (over the next 3 – 5 years).
Figure 4.2 Massive data processing and results visualization
Figure 4.3. The 3 phases of opening up data. Source: [SAW 12] for the European Space Agency. For a color version of the figure, see www.iste.co.uk/monino/data.zip
Figure 4.4. Evolution of Linked Open Data. For a color version of the figure, see www.iste.co.uk/monino/data.zip
Figure 4.5. Web of the future Source: N. Spivack, The Future of the Net
, 2004, available at http://novaspivack.typepad.com/nova_spivacks_weblog/2004/04/new_version_of_.html
Example 4.1. The Google car
Example 4.2. Smart City - Montpellier
Example 4.3. An application on a transport company
Example 4.4. OpenKnowledge Foundation
Example 4.5. A route calculation service for people with reduced mobility
Example 4.6. The SNCF and the reuse of data
Example 4.7. Orange and the site Where do you really live?
Example 4.8. Clouds of texts or words
Example 4.9. Three-dimensional visualization. For a color version of the figure, see www.iste.co.uk/monino/data.zip
Example 4.10. Bipartite e-graph
Conclusion
Figure C.1. Data governance model in the age of data revolution developed by Monino and Sedkaoui
Smart Innovation Set
coordinated by
Dimitri Uzunidis
Volume 3
Big Data, Open Data and Data Development
Jean-Louis Monino
Soraya Sedkaoui
Wiley LogoFirst published 2016 in Great Britain and the United States by ISTE Ltd and John Wiley & Sons, Inc.
Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms and licenses issued by the CLA. Enquiries concerning reproduction outside these terms should be sent to the publishers at the undermentioned address:
ISTE Ltd
27-37 St George’s Road
London SW19 4EU
UK
www.iste.co.uk
John Wiley & Sons, Inc.
111 River Street
Hoboken, NJ 07030
USA
www.wiley.com
© ISTE Ltd 2016
The rights of Jean-Louis Monino and Soraya Sedkaoui to be identified as the authors of this work have been asserted by them in accordance with the Copyright, Designs and Patents Act 1988.
Library of Congress Control Number: 2016931678
British Library Cataloguing-in-Publication Data
A CIP record for this book is available from the British Library
ISBN 978-1-84821-880-2
Acknowledgements
This book is the product of several years of research devoted to data processing, statistics and econometrics in the TRIS (traitement et recherche de l’information et de la statistique) laboratory. It is the fruit of several projects carried out within the framework of research and development for several startups within the Languedoc-Roussillon region and large private and public groups.
I would like to thank all of the members of the RRI (réseau de recherche sur l’innovation), and more particularly, Dimitri Uzunidis, its president, for his attentive and careful reading of the first version, and who encouraged us to publish this book.
Thanks also to M. Bernard Marques, who had the difficult task of proofreading the manuscript and who had many important notes to help with the understanding of the book.
I would also like to thank my teacher and friend Jean Matouk, who was the cause of this publication, thank you for his encouragement and unfailing support over the years.
Many thanks to all the researchers at the laboratory for their help and support and most especially to Soraya Sedkaoui; without her this book would never have seen the light of day.
Thanks to all those who have supported me through difficult times and who have transformed an individual intellectual adventure into a collective one, in particular Alain Iozzino, director of the startup E-prospects, with whom we have carried out many research and development projects over the years.
Finally, I must express my special gratitude to those dear to me, my family, and most of all to my wife, who has had to put up with my moods over the last few years.
Jean-Louis MONINO
This book was a work of adaptation, updating and rewriting in order to adapt all of the work of the TRIS laboratory. Its creation was fed by exchanges and discussions with my teacher Jean-Louis Monino, without whom this book would never have seen the light of day. I am infinitely grateful to him for having included me in this adventure.
It would not have been possible to produce this book without my family who have always encouraged and supported me throughout all my ideas and projects, no matter how far away they have sometimes been. Special mention must go to my mother, there are no words to express how important she is and how much she has done in making me what I am today.
Finally, I would like to thank Hans-Werner Gottinger, Mohamed Kasmi and Mustapha Ghachem for their unfailing support and for the interest that they have always shown in what I am doing.
Soraya SEDKAOUI
chartForeword
The world has become a digitalized place, and technological advancements have multiplied the ways of accessing, processing and disseminating data. Today, new technologies have reached a point of maturity. Data is available to everyone throughout the planet. In 2014, the number of Internet users in the world was 2.9 billion, which is 41% of the world population. The thirst for knowledge can be perceived in the drive to seize this wealth of data. There is a need to inquire, inform and develop data on a massive scale. The boom in networking technologies – including the advent of the Internet, social networks and cloud computing (digital factories) – has greatly increased the volume of data available. As individuals, we create, consume and use digital information: each second, more than 3.4 million emails are sent throughout the world. That is the equivalent of 107,000 billion emails per year, with over 14,600 per person per year, although more than 70% of them are junk mail. Millions of links are shared on social networks, such as Facebook, with over 2.46 million shares every minute. The average time spent on the Internet is over 4.8 hours per day on a computer and 2.1 hours on a cellphone. The new immaterial substance of data
is produced in real-time. It arrives in a continuous stream flowing from a variety of generally heterogeneous sources. This shared pool of all kinds of data (audio, video, files, photos, etc.) is the site of new activities aimed at analyzing the enormous mass of information. It thus becomes necessary to adapt and develop new approaches, methods, forms of knowledge and ways of working, all of which involve new paradigms and stakes as a new ordering system of knowledge must be created and put into place. For most companies, it is difficult to manage this massive amount of data. The greatest challenge is interpreting it. This is especially a challenge for those companies that have to use and implement this massive volume of data, since it requires a specific kind of infrastructure for the creation, storage, treatment, analysis and recovery of the same. The greatest challenge resides in developing
the available data in terms of quality, diversity and access speed.
Alain IOZZINIO
E-PROSPECTS Manager
January 2016
Key Concepts
Before launching into the main text of this book, we have found it pertinent to recall the definitions of some key concepts. Needless to say, the following list is not exhaustive:
– Big Data: The term Big Data is used when the amount of data that an organization has to manage reaches a critical volume that requires new technological approaches in terms of storage, processing, and usage. Volume, speed, and variety are usually the three criteria used to qualify a database as Big Data
.
– Cloud computing: This term designates a set of processes that use computational and/or storage capacities from remote servers connected through a network, usually the Internet. This model allows access to the network on demand. Resources are shared and computational power is configured according to requirements.
– Competitive intelligence: It is the set of coordinated information gathering, processing and dissemination activities useful for economic actors. According to the Marte Report, competitive intelligence can be defined as the set of coordinated information research, processing and dissemination actions aimed at exploiting it for the purpose of economic actors. This diverse set of actions is carried out legally with all data protection guarantees necessary to preserve the company’s assets, with the highest regard to quality, deadlines and cost. Useful information is needed at the company or partnership’s different decision-making levels in