Click here to close now.

Welcome!

XML Authors: AppDynamics Blog, Pat Romanski, Elizabeth White, Carmen Gonzalez, Liz McMillan

Related Topics: XML

XML: Article

Open Source Database Special Feature: An Introduction to Berkeley DB XML

Basic concepts, the shell commands, and beyond

In this article I am going to introduce you to the latest version of the Berkeley DB XML, version 2.2.8. Berkeley DB XML (BDB XML) is built on top of the well-known Berkeley Database (BDB). BDB XML is an open source, native XML database. Like its ancestor, BDB, it's an embedded database. It provides APIs for the Java, C++, Perl, Python, PHP, and Tcl languages. It supports the popular XML query languages XQuery and XPath 2.0. I will show you how to use BDB XML in two ways. This month I will introduce the BDB XML shell, and next month we will explore using BDB XML with Java. BDB XML has a lot of features, and I will try to cover the most important ones.

What's an Embedded Database?
Some of you may be familiar with embedded databases. An embedded database runs within another program. It is not a stand-alone server such as Oracle, DB2, or eXist. It is the programmer's responsibility to invoke the proper API for the database. For example, although Berkeley DB XML has transaction support, in order to activate it, API calls have to be made by the programmer. Many popular relational and native XML databases run in client-server mode. Oracle, DB2, and eXist run as client-server applications, while database server runs as a stand-alone application and many clients can connect to it. Clients and server programs run in different address spaces, probably even on different machines. A connection can be established between them using the JDBC, or ODBC protocols. In an embedded database such as BDB XML, this is not the case; clients and server both run in the same address space, and they are actually the same program. The client manages the Berkeley DB XML through API calls. Next month I will give some Java examples, which will clarify what an embedded database is.

Installation
Installing BDB XML on the Windows operating system is very easy. BDB XML comes with an installer for Windows. If you are using a UNIX-like operating system you have to build BDB XML from the source code. See "Building Berkeley DB XML for UNIX/POSIX systems" for detailed build instructions. I installed BDB XML on my laptop, a 1.4GHz Pentium M processor running on Windows XP Professional with 512Mb main memory, without any problems. The installer will automatically set the necessary Path and Classpath environment variables.

Using dbxml Shell
The shell is located under the bin directory of the BDB XML installation directory. It should be in your Path, so you can run it from any location at the Windows command line.

Invoke the shell by typing "dbxml," and type "quit" to terminate the shell at any time. In order to list all available commands, type "help." To obtain detailed information about a command, type help and the command name. For example "help createContainer" will display usage information of the createContainer command.

createContainer
First of all we have to create a container in which to store our data. A container in BDB XML is similar to a folder. Many XML files can reside in a container. In this article I am going to use XBench sample XML files and queries. You can download the XML sample files and the XML Schema from www.cs.umb.edu/~smimarog/xmlsample/.

Let's create a container called xbench.dbxml. You can, of course, name it anything you want.

dbxml> createContainer xbench.dbxml
Creating node storage container with nodes indexed

The user can choose between two storage types: Wholedoc or Node. The default is Node type storage. If you choose Wholedoc storage, the XML files will be stored as they are with all of the white space preserved. Node storage performs better. The BDB XML documentation suggests not using Wholedoc storage for documents bigger than 1MB.

putDocument
Now we are ready to put some XML data into xbench.dbxml container. In BDB XML the opened container is the default container. We have to open xbench.dbxml before working with it.

dbxml> openContainer xbench.dbxml

Actually, dbxml shell commands on Windows are case insensitive, so it's possible to write opencontainer instead of openContainer, but I am going to follow the established naming conventions here.

dbxml> putDocument sample_10 C:\dictionary10.xml f
Document added, name = sample_10

The putDocument command takes three arguments: the first argument is the unique identifier of the document, the second argument is file name, and the third is either f or s. I chose the name sample_10 as the unique identification of this XML document. C:\dictionary10.xml is the file name, including the path. Finally, the last argument f states that it's a file. Instead of providing a filename it's possible to provide XML data itself within quotes. There are several examples in the "Introduction to Berkeley DB XML" document, which uses an XML string instead of a file name.

Query
As mentioned earlier, BDB XML supports XQuery and XPath 2.0 languages. There will be several examples for each language (see Listings 1-4). BDB XML found 28 results for this query. Note that in order to retrieve the results we have to use "print" command.

setVerbose
BDB XML will produce more explanation about what it does when the setVerbose feature is turned on. This command takes two numeric arguments: level and category. Using this command, let's get more information regarding the XQuery Example2 (see Listing 5).

No indices are used in this query. The execution time is around 48.5 seconds. Execution time is very high for this query for two reasons: this is not a trivial query, and there are no indices created yet. I will conceptually explain the possible execution plan for this query:

For each $entry as /dictionary/e
  For each descendant $desc of $entry
   For each value $val of $desc
   Check if 'hockey' is a substring
    of $val
   return $entry/hwg hw

There are three for loops in this algorithm. At the heart of the three loops there is a very costly substring (contains) operation. Considering that no indices have been created, execution time is pretty good. Creating indices improves the query performance dramatically (see Listings 6-7). (You can find more information about indices in the Indexing section of this article.)

More Stories By Selim Mimaroglu

Selim Mimaroglu is a PhD candidate in computer science at the University of Massachusetts in Boston. He holds an MS in computer science from that school and has a BS in electrical engineering.

Comments (5)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


@ThingsExpo Stories
There's Big Data, then there's really Big Data from the Internet of Things. IoT is evolving to include many data possibilities like new types of event, log and network data. The volumes are enormous, generating tens of billions of logs per day, which raise data challenges. Early IoT deployments are relying heavily on both the cloud and managed service providers to navigate these challenges. In her session at Big Data Expo®, Hannah Smalltree, Director at Treasure Data, discussed how IoT, Big Data and deployments are processing massive data volumes from wearables, utilities and other machines...
Buzzword alert: Microservices and IoT at a DevOps conference? What could possibly go wrong? In this Power Panel at DevOps Summit, moderated by Jason Bloomberg, the leading expert on architecting agility for the enterprise and president of Intellyx, panelists will peel away the buzz and discuss the important architectural principles behind implementing IoT solutions for the enterprise. As remote IoT devices and sensors become increasingly intelligent, they become part of our distributed cloud environment, and we must architect and code accordingly. At the very least, you'll have no problem fil...
SYS-CON Events announced today that MetraTech, now part of Ericsson, has been named “Silver Sponsor” of SYS-CON's 16th International Cloud Expo®, which will take place on June 9–11, 2015, at the Javits Center in New York, NY. Ericsson is the driving force behind the Networked Society- a world leader in communications infrastructure, software and services. Some 40% of the world’s mobile traffic runs through networks Ericsson has supplied, serving more than 2.5 billion subscribers.
The 4th International Internet of @ThingsExpo, co-located with the 17th International Cloud Expo - to be held November 3-5, 2015, at the Santa Clara Convention Center in Santa Clara, CA - announces that its Call for Papers is open. The Internet of Things (IoT) is the biggest idea since the creation of the Worldwide Web more than 20 years ago.
The worldwide cellular network will be the backbone of the future IoT, and the telecom industry is clamoring to get on board as more than just a data pipe. In his session at @ThingsExpo, Evan McGee, CTO of Ring Plus, Inc., discussed what service operators can offer that would benefit IoT entrepreneurs, inventors, and consumers. Evan McGee is the CTO of RingPlus, a leading innovative U.S. MVNO and wireless enabler. His focus is on combining web technologies with traditional telecom to create a new breed of unified communication that is easily accessible to the general consumer. With over a de...
Disruptive macro trends in technology are impacting and dramatically changing the "art of the possible" relative to supply chain management practices through the innovative use of IoT, cloud, machine learning and Big Data to enable connected ecosystems of engagement. Enterprise informatics can now move beyond point solutions that merely monitor the past and implement integrated enterprise fabrics that enable end-to-end supply chain visibility to improve customer service delivery and optimize supplier management. Learn about enterprise architecture strategies for designing connected systems tha...
Cloud is not a commodity. And no matter what you call it, computing doesn’t come out of the sky. It comes from physical hardware inside brick and mortar facilities connected by hundreds of miles of networking cable. And no two clouds are built the same way. SoftLayer gives you the highest performing cloud infrastructure available. One platform that takes data centers around the world that are full of the widest range of cloud computing options, and then integrates and automates everything. Join SoftLayer on June 9 at 16th Cloud Expo to learn about IBM Cloud's SoftLayer platform, explore se...
SYS-CON Media announced today that 9 out of 10 " most read" DevOps articles are published by @DevOpsSummit Blog. Launched in October 2014, @DevOpsSummit Blog offers top articles, news stories, and blog posts from the world's well-known experts and guarantees better exposure for its authors than any other publication. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to wait for long development cycles that produce softw...
With major technology companies and startups seriously embracing IoT strategies, now is the perfect time to attend @ThingsExpo in Silicon Valley. Learn what is going on, contribute to the discussions, and ensure that your enterprise is as "IoT-Ready" as it can be! Internet of @ThingsExpo, taking place Nov 3-5, 2015, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 17th Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The Internet of Things (IoT) is the most profound change in personal an...
15th Cloud Expo, which took place Nov. 4-6, 2014, at the Santa Clara Convention Center in Santa Clara, CA, expanded the conference content of @ThingsExpo, Big Data Expo, and DevOps Summit to include two developer events. IBM held a Bluemix Developer Playground on November 5 and ElasticBox held a Hackathon on November 6. Both events took place on the expo floor. The Bluemix Developer Playground, for developers of all levels, highlighted the ease of use of Bluemix, its services and functionality and provide short-term introductory projects that developers can complete between sessions.
From telemedicine to smart cars, digital homes and industrial monitoring, the explosive growth of IoT has created exciting new business opportunities for real time calls and messaging. In his session at @ThingsExpo, Ivelin Ivanov, CEO and Co-Founder of Telestax, shared some of the new revenue sources that IoT created for Restcomm – the open source telephony platform from Telestax. Ivelin Ivanov is a technology entrepreneur who founded Mobicents, an Open Source VoIP Platform, to help create, deploy, and manage applications integrating voice, video and data. He is the co-founder of TeleStax, a...
The Internet of Things (IoT) promises to evolve the way the world does business; however, understanding how to apply it to your company can be a mystery. Most people struggle with understanding the potential business uses or tend to get caught up in the technology, resulting in solutions that fail to meet even minimum business goals. In his session at @ThingsExpo, Jesse Shiah, CEO / President / Co-Founder of AgilePoint Inc., showed what is needed to leverage the IoT to transform your business. He discussed opportunities and challenges ahead for the IoT from a market and technical point of vie...
Grow your business with enterprise wearable apps using SAP Platforms and Google Glass. SAP and Google just launched the SAP and Google Glass Challenge, an opportunity for you to innovate and develop the best Enterprise Wearable App using SAP Platforms and Google Glass and gain valuable market exposure. In his session at @ThingsExpo, Brian McPhail, Senior Director of Business Development, ISVs & Digital Commerce at SAP, outlined the timeline of the SAP Google Glass Challenge and the opportunity for developers, start-ups, and companies of all sizes to engage with SAP today.
The 3rd International @ThingsExpo, co-located with the 16th International Cloud Expo – to be held June 9-11, 2015, at the Javits Center in New York City, NY – is now accepting Hackathon proposals. Hackathon sponsorship benefits include general brand exposure and increasing engagement with the developer ecosystem. At Cloud Expo 2014 Silicon Valley, IBM held the Bluemix Developer Playground on November 5 and ElasticBox held the DevOps Hackathon on November 6. Both events took place on the expo floor. The Bluemix Developer Playground, for developers of all levels, highlighted the ease of use of...
Enthusiasm for the Internet of Things has reached an all-time high. In 2013 alone, venture capitalists spent more than $1 billion dollars investing in the IoT space. With "smart" appliances and devices, IoT covers wearable smart devices, cloud services to hardware companies. Nest, a Google company, detects temperatures inside homes and automatically adjusts it by tracking its user's habit. These technologies are quickly developing and with it come challenges such as bridging infrastructure gaps, abiding by privacy concerns and making the concept a reality. These challenges can't be addressed w...
The industrial software market has treated data with the mentality of “collect everything now, worry about how to use it later.” We now find ourselves buried in data, with the pervasive connectivity of the (Industrial) Internet of Things only piling on more numbers. There’s too much data and not enough information. In his session at @ThingsExpo, Bob Gates, Global Marketing Director, GE’s Intelligent Platforms business, to discuss how realizing the power of IoT, software developers are now focused on understanding how industrial data can create intelligence for industrial operations. Imagine ...
SYS-CON Events announced today that Liaison Technologies, a leading provider of data management and integration cloud services and solutions, has been named "Silver Sponsor" of SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York, NY. Liaison Technologies is a recognized market leader in providing cloud-enabled data integration and data management solutions to break down complex information barriers, enabling enterprises to make smarter decisions, faster.
The 17th International Cloud Expo has announced that its Call for Papers is open. 17th International Cloud Expo, to be held November 3-5, 2015, at the Santa Clara Convention Center in Santa Clara, CA, brings together Cloud Computing, APM, APIs, Microservices, Security, Big Data, Internet of Things, DevOps and WebRTC to one location. With cloud computing driving a higher percentage of enterprise IT budgets every year, it becomes increasingly important to plant your flag in this fast-expanding business opportunity. Submit your speaking proposal today!
Hadoop as a Service (as offered by handful of niche vendors now) is a cloud computing solution that makes medium and large-scale data processing accessible, easy, fast and inexpensive. In his session at Big Data Expo, Kumar Ramamurthy, Vice President and Chief Technologist, EIM & Big Data, at Virtusa, will discuss how this is achieved by eliminating the operational challenges of running Hadoop, so one can focus on business growth. The fragmented Hadoop distribution world and various PaaS solutions that provide a Hadoop flavor either make choices for customers very flexible in the name of opti...
Cultural, regulatory, environmental, political and economic (CREPE) conditions over the past decade are creating cross-industry solution spaces that require processes and technologies from both the Internet of Things (IoT), and Data Management and Analytics (DMA). These solution spaces are evolving into Sensor Analytics Ecosystems (SAE) that represent significant new opportunities for organizations of all types. Public Utilities throughout the world, providing electricity, natural gas and water, are pursuing SmartGrid initiatives that represent one of the more mature examples of SAE. We have s...