Welcome!

Industrial IoT Authors: Elizabeth White, Stackify Blog, Yeshim Deniz, SmartBear Blog, Liz McMillan

Blog Feed Post

The Structure of Big Data

First things first, all data is more or less structured. That being said, there is…

  • Structured Data
  • Semi-Structured Data
  • Unstructured Data

I tend to think of it as: data, composite or simple, with or without content. In that context, email is structured composite data (from, to, subject, date) with unstructured content (message body). The composite data is structured. The content is unstructured. Though simple data may or may not be structured. The ‘subject’ data is unstructured. The ‘to’ data is structured. It is composed of a local-part (username) and a domain.

While content is unstructured, there may be an implied structure.

So what is the difference between structured data, semi-structured data, and unstructured data?

Structured Data

The structure is externally enforced. Data

  • The data is stored in a database.
    • Relational
      • Transaction
    • XML
      • Catalog

The data itself is not structured. The structure is defined by the database. The data would be semi-structured if it was exported and transformed into JSON or XML.

Semi-Structured Data

The structure is self defined. Data and / or Content

  • The data is stored as text.
    • JSON
      • User Profile
    • XML
      • Application / Form

Unstructured Data

The structure of the data is externally defined. Metadata and Content

  • The data is stored in a binary format and / or document.
    • Media (e.g. Ogg).
    • Video (e.g. Vorbis).
    • Audio (e.g. Theora).
    • Image (e.g. PNG).
    • Microsoft Word
    • Adobe PDF

The structure is defined by the file format. The data is composed of structured metadata and unstructured content. That being said, a video is composed of frames and an image is composed of pixels.

  • The data is stored as plain text.
    • Log

The structure is defined by a pattern in the logging configuration file. The data is composed of structured metadata (e.g. severity) and unstructured content (log message).

  • The data is user generated.
    • Status (Facebook)
    • Tweet (Twitter)
    • Comment (WordPress)

The structure is defined by the application / form. The data is composed of structured metadata (e.g. user ID) and unstructured content (user message).

Update

I would say that the structure of content is user defined and thus interpreted. However, content is often a component of data (unstructured, externally defined). Though if I typed up this post in gedit and saved it as a text file, that might constitute content independent of data.


With all all that said, the structure of data is not exactly black and white.


Read the original blog entry...

More Stories By Daniel Thompson

I curate the content on this page, but the credit goes to my talented colleagues for the posts that you see here. Much of what you read on this page is the work of friends at How to JBoss, and I encourage you to drop by the site at http://www.howtojboss.com for some of the best JBoss technical and non-technical content for developers, architects and technology executives on the Web.

IoT & Smart Cities Stories
Cell networks have the advantage of long-range communications, reaching an estimated 90% of the world. But cell networks such as 2G, 3G and LTE consume lots of power and were designed for connecting people. They are not optimized for low- or battery-powered devices or for IoT applications with infrequently transmitted data. Cell IoT modules that support narrow-band IoT and 4G cell networks will enable cell connectivity, device management, and app enablement for low-power wide-area network IoT. B...
The Internet of Things will challenge the status quo of how IT and development organizations operate. Or will it? Certainly the fog layer of IoT requires special insights about data ontology, security and transactional integrity. But the developmental challenges are the same: People, Process and Platform and how we integrate our thinking to solve complicated problems. In his session at 19th Cloud Expo, Craig Sproule, CEO of Metavine, demonstrated how to move beyond today's coding paradigm and sh...
What are the new priorities for the connected business? First: businesses need to think differently about the types of connections they will need to make – these span well beyond the traditional app to app into more modern forms of integration including SaaS integrations, mobile integrations, APIs, device integration and Big Data integration. It’s important these are unified together vs. doing them all piecemeal. Second, these types of connections need to be simple to design, adapt and configure...
Cloud computing delivers on-demand resources that provide businesses with flexibility and cost-savings. The challenge in moving workloads to the cloud has been the cost and complexity of ensuring the initial and ongoing security and regulatory (PCI, HIPAA, FFIEC) compliance across private and public clouds. Manual security compliance is slow, prone to human error, and represents over 50% of the cost of managing cloud applications. Determining how to automate cloud security compliance is critical...
Contextual Analytics of various threat data provides a deeper understanding of a given threat and enables identification of unknown threat vectors. In his session at @ThingsExpo, David Dufour, Head of Security Architecture, IoT, Webroot, Inc., discussed how through the use of Big Data analytics and deep data correlation across different threat types, it is possible to gain a better understanding of where, how and to what level of danger a malicious actor poses to an organization, and to determin...
Nicolas Fierro is CEO of MIMIR Blockchain Solutions. He is a programmer, technologist, and operations dev who has worked with Ethereum and blockchain since 2014. His knowledge in blockchain dates to when he performed dev ops services to the Ethereum Foundation as one the privileged few developers to work with the original core team in Switzerland.
Digital Transformation and Disruption, Amazon Style - What You Can Learn. Chris Kocher is a co-founder of Grey Heron, a management and strategic marketing consulting firm. He has 25+ years in both strategic and hands-on operating experience helping executives and investors build revenues and shareholder value. He has consulted with over 130 companies on innovating with new business models, product strategies and monetization. Chris has held management positions at HP and Symantec in addition to ...
Cloud-enabled transformation has evolved from cost saving measure to business innovation strategy -- one that combines the cloud with cognitive capabilities to drive market disruption. Learn how you can achieve the insight and agility you need to gain a competitive advantage. Industry-acclaimed CTO and cloud expert, Shankar Kalyana presents. Only the most exceptional IBMers are appointed with the rare distinction of IBM Fellow, the highest technical honor in the company. Shankar has also receive...
DXWorldEXPO LLC announced today that Telecom Reseller has been named "Media Sponsor" of CloudEXPO | DXWorldEXPO 2018 New York, which will take place on November 11-13, 2018 in New York City, NY. Telecom Reseller reports on Unified Communications, UCaaS, BPaaS for enterprise and SMBs. They report extensively on both customer premises based solutions such as IP-PBX as well as cloud based and hosted platforms.
Enterprises have taken advantage of IoT to achieve important revenue and cost advantages. What is less apparent is how incumbent enterprises operating at scale have, following success with IoT, built analytic, operations management and software development capabilities - ranging from autonomous vehicles to manageable robotics installations. They have embraced these capabilities as if they were Silicon Valley startups.