Welcome!

XML Authors: Liz McMillan, Mike Kavis, Alan Clark, Elizabeth White, AppDynamics Blog

Related Topics: XML

XML: Article

Open Source Database Special Feature: An Introduction to Berkeley DB XML

Basic concepts, the shell commands, and beyond

In this article I am going to introduce you to the latest version of the Berkeley DB XML, version 2.2.8. Berkeley DB XML (BDB XML) is built on top of the well-known Berkeley Database (BDB). BDB XML is an open source, native XML database. Like its ancestor, BDB, it's an embedded database. It provides APIs for the Java, C++, Perl, Python, PHP, and Tcl languages. It supports the popular XML query languages XQuery and XPath 2.0. I will show you how to use BDB XML in two ways. This month I will introduce the BDB XML shell, and next month we will explore using BDB XML with Java. BDB XML has a lot of features, and I will try to cover the most important ones.

What's an Embedded Database?
Some of you may be familiar with embedded databases. An embedded database runs within another program. It is not a stand-alone server such as Oracle, DB2, or eXist. It is the programmer's responsibility to invoke the proper API for the database. For example, although Berkeley DB XML has transaction support, in order to activate it, API calls have to be made by the programmer. Many popular relational and native XML databases run in client-server mode. Oracle, DB2, and eXist run as client-server applications, while database server runs as a stand-alone application and many clients can connect to it. Clients and server programs run in different address spaces, probably even on different machines. A connection can be established between them using the JDBC, or ODBC protocols. In an embedded database such as BDB XML, this is not the case; clients and server both run in the same address space, and they are actually the same program. The client manages the Berkeley DB XML through API calls. Next month I will give some Java examples, which will clarify what an embedded database is.

Installation
Installing BDB XML on the Windows operating system is very easy. BDB XML comes with an installer for Windows. If you are using a UNIX-like operating system you have to build BDB XML from the source code. See "Building Berkeley DB XML for UNIX/POSIX systems" for detailed build instructions. I installed BDB XML on my laptop, a 1.4GHz Pentium M processor running on Windows XP Professional with 512Mb main memory, without any problems. The installer will automatically set the necessary Path and Classpath environment variables.

Using dbxml Shell
The shell is located under the bin directory of the BDB XML installation directory. It should be in your Path, so you can run it from any location at the Windows command line.

Invoke the shell by typing "dbxml," and type "quit" to terminate the shell at any time. In order to list all available commands, type "help." To obtain detailed information about a command, type help and the command name. For example "help createContainer" will display usage information of the createContainer command.

createContainer
First of all we have to create a container in which to store our data. A container in BDB XML is similar to a folder. Many XML files can reside in a container. In this article I am going to use XBench sample XML files and queries. You can download the XML sample files and the XML Schema from www.cs.umb.edu/~smimarog/xmlsample/.

Let's create a container called xbench.dbxml. You can, of course, name it anything you want.

dbxml> createContainer xbench.dbxml
Creating node storage container with nodes indexed

The user can choose between two storage types: Wholedoc or Node. The default is Node type storage. If you choose Wholedoc storage, the XML files will be stored as they are with all of the white space preserved. Node storage performs better. The BDB XML documentation suggests not using Wholedoc storage for documents bigger than 1MB.

putDocument
Now we are ready to put some XML data into xbench.dbxml container. In BDB XML the opened container is the default container. We have to open xbench.dbxml before working with it.

dbxml> openContainer xbench.dbxml

Actually, dbxml shell commands on Windows are case insensitive, so it's possible to write opencontainer instead of openContainer, but I am going to follow the established naming conventions here.

dbxml> putDocument sample_10 C:\dictionary10.xml f
Document added, name = sample_10

The putDocument command takes three arguments: the first argument is the unique identifier of the document, the second argument is file name, and the third is either f or s. I chose the name sample_10 as the unique identification of this XML document. C:\dictionary10.xml is the file name, including the path. Finally, the last argument f states that it's a file. Instead of providing a filename it's possible to provide XML data itself within quotes. There are several examples in the "Introduction to Berkeley DB XML" document, which uses an XML string instead of a file name.

Query
As mentioned earlier, BDB XML supports XQuery and XPath 2.0 languages. There will be several examples for each language (see Listings 1-4). BDB XML found 28 results for this query. Note that in order to retrieve the results we have to use "print" command.

setVerbose
BDB XML will produce more explanation about what it does when the setVerbose feature is turned on. This command takes two numeric arguments: level and category. Using this command, let's get more information regarding the XQuery Example2 (see Listing 5).

No indices are used in this query. The execution time is around 48.5 seconds. Execution time is very high for this query for two reasons: this is not a trivial query, and there are no indices created yet. I will conceptually explain the possible execution plan for this query:

For each $entry as /dictionary/e
  For each descendant $desc of $entry
   For each value $val of $desc
   Check if 'hockey' is a substring
    of $val
   return $entry/hwg hw

There are three for loops in this algorithm. At the heart of the three loops there is a very costly substring (contains) operation. Considering that no indices have been created, execution time is pretty good. Creating indices improves the query performance dramatically (see Listings 6-7). (You can find more information about indices in the Indexing section of this article.)

More Stories By Selim Mimaroglu

Selim Mimaroglu is a PhD candidate in computer science at the University of Massachusetts in Boston. He holds an MS in computer science from that school and has a BS in electrical engineering.

Comments (5) View Comments

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


Most Recent Comments
SYS-CON Belgium News Desk 12/20/05 01:07:02 PM EST

Open Source Database Special Feature: An Introduction to Berkeley DB XML. In this article I am going to introduce you to the latest version of the Berkeley DB XML, version 2.2.8. Berkeley DB XML (BDB XML) is built on top of the well-known Berkeley Database (BDB). BDB XML is an open source, native XML database. Like its ancestor, BDB, it's an embedded database. It provides APIs for the Java, C++, Perl, Python, PHP, and Tcl languages. It supports the popular XML query languages XQuery and XPath 2.0. I will show you how to use BDB XML in two ways. This month I will introduce the BDB XML shell, and next month we will explore using BDB XML with Java. BDB XML has a lot of features, and I will try to cover the most important ones.

SYS-CON Canada News Desk 12/20/05 12:23:13 PM EST

Open Source Database Special Feature: An Introduction to Berkeley DB XML. In this article I am going to introduce you to the latest version of the Berkeley DB XML, version 2.2.8. Berkeley DB XML (BDB XML) is built on top of the well-known Berkeley Database (BDB). BDB XML is an open source, native XML database. Like its ancestor, BDB, it's an embedded database. It provides APIs for the Java, C++, Perl, Python, PHP, and Tcl languages. It supports the popular XML query languages XQuery and XPath 2.0. I will show you how to use BDB XML in two ways. This month I will introduce the BDB XML shell, and next month we will explore using BDB XML with Java. BDB XML has a lot of features, and I will try to cover the most important ones.

SYS-CON Germany News Desk 12/20/05 11:39:03 AM EST

Open Source Database Special Feature: An Introduction to Berkeley DB XML. In this article I am going to introduce you to the latest version of the Berkeley DB XML, version 2.2.8. Berkeley DB XML (BDB XML) is built on top of the well-known Berkeley Database (BDB). BDB XML is an open source, native XML database. Like its ancestor, BDB, it's an embedded database. It provides APIs for the Java, C++, Perl, Python, PHP, and Tcl languages. It supports the popular XML query languages XQuery and XPath 2.0. I will show you how to use BDB XML in two ways. This month I will introduce the BDB XML shell, and next month we will explore using BDB XML with Java. BDB XML has a lot of features, and I will try to cover the most important ones.

SYS-CON UK News Desk 12/20/05 10:48:56 AM EST

Open Source Database Special Feature: An Introduction to Berkeley DB XML. In this article I am going to introduce you to the latest version of the Berkeley DB XML, version 2.2.8. Berkeley DB XML (BDB XML) is built on top of the well-known Berkeley Database (BDB). BDB XML is an open source, native XML database. Like its ancestor, BDB, it's an embedded database. It provides APIs for the Java, C++, Perl, Python, PHP, and Tcl languages. It supports the popular XML query languages XQuery and XPath 2.0. I will show you how to use BDB XML in two ways. This month I will introduce the BDB XML shell, and next month we will explore using BDB XML with Java. BDB XML has a lot of features, and I will try to cover the most important ones.

XML News Desk 12/20/05 10:12:48 AM EST

Open Source Database Special Feature: An Introduction to Berkeley DB XML. In this article I am going to introduce you to the latest version of the Berkeley DB XML, version 2.2.8. Berkeley DB XML (BDB XML) is built on top of the well-known Berkeley Database (BDB). BDB XML is an open source, native XML database. Like its ancestor, BDB, it's an embedded database. It provides APIs for the Java, C++, Perl, Python, PHP, and Tcl languages. It supports the popular XML query languages XQuery and XPath 2.0. I will show you how to use BDB XML in two ways. This month I will introduce the BDB XML shell, and next month we will explore using BDB XML with Java. BDB XML has a lot of features, and I will try to cover the most important ones.

@ThingsExpo Stories
Code Halos - aka "digital fingerprints" - are the key organizing principle to understand a) how dumb things become smart and b) how to monetize this dynamic. In his session at @ThingsExpo, Robert Brown, AVP, Center for the Future of Work at Cognizant Technology Solutions, outlined research, analysis and recommendations from his recently published book on this phenomena on the way leading edge organizations like GE and Disney are unlocking the Internet of Things opportunity and what steps your organization should be taking to position itself for the next platform of digital competition.
The Industrial Internet revolution is now underway, enabled by connected machines and billions of devices that communicate and collaborate. The massive amounts of Big Data requiring real-time analysis is flooding legacy IT systems and giving way to cloud environments that can handle the unpredictable workloads. Yet many barriers remain until we can fully realize the opportunities and benefits from the convergence of machines and devices with Big Data and the cloud, including interoperability, data security and privacy.
In their session at @ThingsExpo, Shyam Varan Nath, Principal Architect at GE, and Ibrahim Gokcen, who leads GE's advanced IoT analytics, focused on the Internet of Things / Industrial Internet and how to make it operational for business end-users. Learn about the challenges posed by machine and sensor data and how to marry it with enterprise data. They also discussed the tips and tricks to provide the Industrial Internet as an end-user consumable service using Big Data Analytics and Industrial Cloud.
SYS-CON Media announced that Splunk, a provider of the leading software platform for real-time Operational Intelligence, has launched an ad campaign on Big Data Journal. Splunk software and cloud services enable organizations to search, monitor, analyze and visualize machine-generated big data coming from websites, applications, servers, networks, sensors and mobile devices. The ads focus on delivering ROI - how improved uptime delivered $6M in annual ROI, improving customer operations by mining large volumes of unstructured data, and how data tracking delivers uptime when it matters most.
Today’s enterprise is being driven by disruptive competitive and human capital requirements to provide enterprise application access through not only desktops, but also mobile devices. To retrofit existing programs across all these devices using traditional programming methods is very costly and time consuming – often prohibitively so. In his session at @ThingsExpo, Jesse Shiah, CEO, President, and Co-Founder of AgilePoint Inc., discussed how you can create applications that run on all mobile devices as well as laptops and desktops using a visual drag-and-drop application – and eForms-buildi...
Things are being built upon cloud foundations to transform organizations. This CEO Power Panel at 15th Cloud Expo, moderated by Roger Strukhoff, Cloud Expo and @ThingsExpo conference chair, addressed the big issues involving these technologies and, more important, the results they will achieve. Rodney Rogers, chairman and CEO of Virtustream; Brendan O'Brien, co-founder of Aria Systems, Bart Copeland, president and CEO of ActiveState Software; Jim Cowie, chief scientist at Dyn; Dave Wagstaff, VP and chief architect at BSQUARE Corporation; Seth Proctor, CTO of NuoDB, Inc.; and Andris Gailitis, C...
SYS-CON Events announced today that CodeFutures, a leading supplier of database performance tools, has been named a “Sponsor” of SYS-CON's 16th International Cloud Expo®, which will take place on June 9–11, 2015, at the Javits Center in New York, NY. CodeFutures is an independent software vendor focused on providing tools that deliver database performance tools that increase productivity during database development and increase database performance and scalability during production.
SYS-CON Events announced today that ActiveState, the leading independent Cloud Foundry and Docker-based PaaS provider, has been named “Silver Sponsor” of SYS-CON's DevOps Summit New York, which will take place June 9-11, 2015, at the Javits Center in New York City, NY. ActiveState believes that enterprises gain a competitive advantage when they are able to quickly create, deploy and efficiently manage software solutions that immediately create business value, but they face many challenges that prevent them from doing so. The Company is uniquely positioned to help address these challenges thro...
IoT is still a vague buzzword for many people. In his session at @ThingsExpo, Mike Kavis, Vice President & Principal Cloud Architect at Cloud Technology Partners, discussed the business value of IoT that goes far beyond the general public's perception that IoT is all about wearables and home consumer services. He also discussed how IoT is perceived by investors and how venture capitalist access this space. Other topics discussed were barriers to success, what is new, what is old, and what the future may hold. Mike Kavis is Vice President & Principal Cloud Architect at Cloud Technology Pa...
Dale Kim is the Director of Industry Solutions at MapR. His background includes a variety of technical and management roles at information technology companies. While his experience includes work with relational databases, much of his career pertains to non-relational data in the areas of search, content management, and NoSQL, and includes senior roles in technical marketing, sales engineering, and support engineering. Dale holds an MBA from Santa Clara University, and a BA in Computer Science from the University of California, Berkeley.
SYS-CON Media announced that Cisco, a worldwide leader in IT that helps companies seize the opportunities of tomorrow, has launched a new ad campaign in Cloud Computing Journal. The ad campaign, a webcast titled 'Is Your Data Center Ready for the Application Economy?', focuses on the latest data center networking technologies, including SDN or ACI, and how customers are using SDN and ACI in their organizations to achieve business agility. The Cisco webcast is available on-demand.
The Internet of Things (IoT) promises to evolve the way the world does business; however, understanding how to apply it to your company can be a mystery. Most people struggle with understanding the potential business uses or tend to get caught up in the technology, resulting in solutions that fail to meet even minimum business goals. In his session at @ThingsExpo, Jesse Shiah, CEO / President / Co-Founder of AgilePoint Inc., showed what is needed to leverage the IoT to transform your business. He discussed opportunities and challenges ahead for the IoT from a market and technical point of vie...
"People are a lot more knowledgeable about APIs now. There are two types of people who work with APIs - IT people who want to use APIs for something internal and the product managers who want to do something outside APIs for people to connect to them," explained Roberto Medrano, Executive Vice President at SOA Software, in this SYS-CON.tv interview at Cloud Expo, held Nov 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
Performance is the intersection of power, agility, control, and choice. If you value performance, and more specifically consistent performance, you need to look beyond simple virtualized compute. Many factors need to be considered to create a truly performant environment. In his General Session at 15th Cloud Expo, Harold Hannon, Sr. Software Architect at SoftLayer, discussed how to take advantage of a multitude of compute options and platform features to make cloud the cornerstone of your online presence.
In this Women in Technology Power Panel at 15th Cloud Expo, moderated by Anne Plese, Senior Consultant, Cloud Product Marketing at Verizon Enterprise, Esmeralda Swartz, CMO at MetraTech; Evelyn de Souza, Data Privacy and Compliance Strategy Leader at Cisco Systems; Seema Jethani, Director of Product Management at Basho Technologies; Victoria Livschitz, CEO of Qubell Inc.; Anne Hungate, Senior Director of Software Quality at DIRECTV, discussed what path they took to find their spot within the technology industry and how do they see opportunities for other women in their area of expertise.
Almost everyone sees the potential of Internet of Things but how can businesses truly unlock that potential. The key will be in the ability to discover business insight in the midst of an ocean of Big Data generated from billions of embedded devices via Systems of Discover. Businesses will also need to ensure that they can sustain that insight by leveraging the cloud for global reach, scale and elasticity.
“The age of the Internet of Things is upon us,” stated Thomas Svensson, senior vice-president and general manager EMEA, ThingWorx, “and working with forward-thinking companies, such as Elisa, enables us to deploy our leading technology so that customers can profit from complete, end-to-end solutions.” ThingWorx, a PTC® (Nasdaq: PTC) business and Internet of Things (IoT) platform provider, announced on Monday that Elisa, Finnish provider of mobile and fixed broadband subscriptions, will deploy ThingWorx® platform technology to enable a new Elisa IoT service in Finland and Estonia.
Advanced Persistent Threats (APTs) are increasing at an unprecedented rate. The threat landscape of today is drastically different than just a few years ago. Attacks are much more organized and sophisticated. They are harder to detect and even harder to anticipate. In the foreseeable future it's going to get a whole lot harder. Everything you know today will change. Keeping up with this changing landscape is already a daunting task. Your organization needs to use the latest tools, methods and expertise to guard against those threats. But will that be enough? In the foreseeable future attacks w...
As enterprises move to all-IP networks and cloud-based applications, communications service providers (CSPs) – facing increased competition from over-the-top providers delivering content via the Internet and independently of CSPs – must be able to offer seamless cloud-based communication and collaboration solutions that can scale for small, midsize, and large enterprises, as well as public sector organizations, in order to keep and grow market share. The latest version of Oracle Communications Unified Communications Suite gives CSPs the capability to do just that. In addition, its integration ...
From telemedicine to smart cars, digital homes and industrial monitoring, the explosive growth of IoT has created exciting new business opportunities for real time calls and messaging. In his session at @ThingsExpo, Ivelin Ivanov, CEO and Co-Founder of Telestax, shared some of the new revenue sources that IoT created for Restcomm – the open source telephony platform from Telestax. Ivelin Ivanov is a technology entrepreneur who founded Mobicents, an Open Source VoIP Platform, to help create, deploy, and manage applications integrating voice, video and data. He is the co-founder of TeleStax, a...