Industrial IoT Authors: Yeshim Deniz, Liz McMillan, Pat Romanski, Kevin Benedict, Elizabeth White

Blog Feed Post

Tokenization for De-Identifying APIs

De-identifying Data in APIs

I was catching up on my RSS feeds over the weekend, reading all the things I missed while I was at IDF, when I saw this great post from Kin Lane calling for “A Masking, Scrubbing, Anonymizing API“.  It reminded me of a conversation I had at IDF about Kaggle, which is a platform for crowdsourcing solutions to big data problems.  In both cases, the goal is to surface data in a way that protects personal information.  It got me thinking about how compliance intersects with API strategies.  With APIs being a universal tunnel into the enterprise, it’s important not to neglect security compliance in API content!  Fortunately, a Tokenization proxy or API Manager can be used to address these types of usage models.

Tokenization vs  Encryption vs Redaction

Tokenization is the process of replacing a string with another randomized string.  Expressway Tokenization Broker can perform this operation as a proxy for any API response, storing the PII in a secure vault.  The only way to recover the original data is through a detokenization routine performed by a system with access to the secure vault.  This is somewhat similar to the mechanism Kin describes (replacing actual values with fake values), except that the tokens are not likely to be human-readable (i.e. instead of replacing Kin Lane with John Doe it might wind up reading zAe N8fc).  On the other hand, tokenization preserves correlation – if you replace every instance of any name with “John Doe” you may lose the ability to do associations across data sets.  The Retail industry has been using this mechanism for years, adopting the tokenization of Payment Account Numbers (PAN) as a best practice for PCI compliance.  We have recently seen adoption of this tokenization capability for other types of PII, particularly where there are compliance and audit concerns.

Tokenization Process

Tokenization of Payment Account Numbers for PCI Compliance

Format-Preserving Encryption (FPE) is another mechanism for de-identifying data.  It is available in all of our Expressway products.  In this case, the data is encrypted using ciphertext that conforms to the same formatting as the input data.  For example, the SSN 123-45-1234 might encrypt to 789-12-3456.  This ensures that the ciphertext will pass any downstream format checking that may occur.  However, unlike tokenization, FPE is reversible — it is possible to decrypt the ciphertext to plaintext without access to a tokenization vault.  This makes the ciphertext behave more like encrypted data, enabling applications to use a shared secret to decrypt the data independently from the secure vault.

Finally, data can be anonymized using redaction, which is also supported in all of our products.  This is the process of eliminating PII entirely rather than replacing it.  This is the most surefire mechanism for keeping PII out of the wrong hands, but it comes with a potential downfall:  it may prevent records from being associated with the same owner, particularly across data sets.  This correlation can be the most valuable opportunity in many types of big data analysis.

De-Identification Using the Façade API Proxy Pattern

We have seen customers take advantage of regular expressions to identify personally identifiable information (PII).  There are standard policies that can pick out Social Security Numbers, email addresses, and other common types of PII in any API.  Nonstandard types of PII can be detected as well, provided that they conform to a well-defined structure (generally alphanumeric with a fixed length, although other patterns can be identified as well).  Once the PII has been identified, the data can be de-identified using tokenization or encryption (including format-preserving encryption).  Or the data can be anonymized completely via redaction. This policy can be generalized to proxy several APIs and replace any PII that passes through.  This works particularly well for credit card or social security numbers, both of which follow a very well-defined and relatively unique pattern.

Anonymization policies can also be tailored to specific APIs that have well-defined schemas (along the lines of the Swagger example that Kin suggested), matching based on the JSON or XML field information.  For example, a colleague and I were playing with the idea of stashing employee information in DynamoDB.  An employee record might look like:

  "Name": { "S": "Sally Rockstar" }, 
  "Email": { "S": "[email protected]" },
  "City": { "S": "Mountain View" },
  "State": { "S": "CA" },
  "Zip": { "N": "90210" },
  "DriversLic": { "S": "A1234567" },
  "SSN": { "S": "123-45-1234" },
  "CurrentSalary": { "N": "60000" } 

Within this data set, email, SSNs, Drivers License numbers, and Zip Codes follow well-established rules that lend themselves to regular expressions.  However, the zip code rule (5 digit number) could match the salary field.  Obviously you could enforce Zip+4 and decimal inputs (XXXXX-XXXX for Zip Code, XXXXX.XX for CurrentSalary), but it would probably be safer to match the name rather than the value for this data set.

Another benefit of the anonymizing facade API pattern is that it can support conditional de-identification.  For example, I may want to allow the PII to be read within my network but have it de-identified for external clients.  Or I may want to tokenize internally but redact externally.  We can define a workflow that uses any of a number of factors to make the decision at API request time, allowing access to live data rather than a snapshot.


I’m excited about the potential for APIs to allow faster problem solving through crowdsourcing.  Kaggle looks like a very interesting platform for enabling this.  I’m also happy to see folks like Kin working to make government more open and accessible through the use of APIs.  API gateways can play a role in those transformations by sanitizing the data, reducing the risk of PII being compromised.  As Mark Silverberg pointed out in the comments on Kin’s blog, the safest way to protect PII is to scrub the data set before it goes out.  By using a tokenizing or encrypting proxy facade, the “scrubbing” is made internal, minimizing the risk of an escape.

As I noted above, our products are unique in the API management space, in that they support high-performance de-identification policies.  They also include powerful regular expression libraries that can be used to identify (and then de-identify) PII that is contained in an API response.  I did a webinar with John Kindervag recently that touched on many of these topics as well.  You can watch the replay to learn more, or try out FPE and redaction for yourself using Expressway API Manager on Amazon Web Services.

The post Tokenization for De-Identifying APIs appeared first on Application Security.

Read the original blog entry...

More Stories By Application Security

This blog references our expert posts on application and web services security.

@ThingsExpo Stories
Amazon started as an online bookseller 20 years ago. Since then, it has evolved into a technology juggernaut that has disrupted multiple markets and industries and touches many aspects of our lives. It is a relentless technology and business model innovator driving disruption throughout numerous ecosystems. Amazon’s AWS revenues alone are approaching $16B a year making it one of the largest IT companies in the world. With dominant offerings in Cloud, IoT, eCommerce, Big Data, AI, Digital Assista...
Organizations planning enterprise data center consolidation and modernization projects are faced with a challenging, costly reality. Requirements to deploy modern, cloud-native applications simultaneously with traditional client/server applications are almost impossible to achieve with hardware-centric enterprise infrastructure. Compute and network infrastructure are fast moving down a software-defined path, but storage has been a laggard. Until now.
Digital Transformation is much more than a buzzword. The radical shift to digital mechanisms for almost every process is evident across all industries and verticals. This is often especially true in financial services, where the legacy environment is many times unable to keep up with the rapidly shifting demands of the consumer. The constant pressure to provide complete, omnichannel delivery of customer-facing solutions to meet both regulatory and customer demands is putting enormous pressure on...
In his general session at 19th Cloud Expo, Manish Dixit, VP of Product and Engineering at Dice, discussed how Dice leverages data insights and tools to help both tech professionals and recruiters better understand how skills relate to each other and which skills are in high demand using interactive visualizations and salary indicator tools to maximize earning potential. Manish Dixit is VP of Product and Engineering at Dice. As the leader of the Product, Engineering and Data Sciences team at D...
DXWorldEXPO LLC announced today that All in Mobile, a mobile app development company from Poland, will exhibit at the 22nd International CloudEXPO | DXWorldEXPO. All In Mobile is a mobile app development company from Poland. Since 2014, they maintain passion for developing mobile applications for enterprises and startups worldwide.
"Akvelon is a software development company and we also provide consultancy services to folks who are looking to scale or accelerate their engineering roadmaps," explained Jeremiah Mothersell, Marketing Manager at Akvelon, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
IoT is at the core or many Digital Transformation initiatives with the goal of re-inventing a company's business model. We all agree that collecting relevant IoT data will result in massive amounts of data needing to be stored. However, with the rapid development of IoT devices and ongoing business model transformation, we are not able to predict the volume and growth of IoT data. And with the lack of IoT history, traditional methods of IT and infrastructure planning based on the past do not app...
DXWorldEXPO LLC announced today that the upcoming DXWorldEXPO | CloudEXPO New York event will feature 10 companies from Poland to participate at the "Poland Digital Transformation Pavilion" on November 12-13, 2018.
22nd International Cloud Expo, taking place June 5-7, 2018, at the Javits Center in New York City, NY, and co-located with the 1st DXWorld Expo will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud ...
@DevOpsSummit at Cloud Expo, taking place November 12-13 in New York City, NY, is co-located with 22nd international CloudEXPO | first international DXWorldEXPO and will feature technical sessions from a rock star conference faculty and the leading industry players in the world.
More and more brands have jumped on the IoT bandwagon. We have an excess of wearables – activity trackers, smartwatches, smart glasses and sneakers, and more that track seemingly endless datapoints. However, most consumers have no idea what “IoT” means. Creating more wearables that track data shouldn't be the aim of brands; delivering meaningful, tangible relevance to their users should be. We're in a period in which the IoT pendulum is still swinging. Initially, it swung toward "smart for smart...
As data explodes in quantity, importance and from new sources, the need for managing and protecting data residing across physical, virtual, and cloud environments grow with it. Managing data includes protecting it, indexing and classifying it for true, long-term management, compliance and E-Discovery. Commvault can ensure this with a single pane of glass solution – whether in a private cloud, a Service Provider delivered public cloud or a hybrid cloud environment – across the heterogeneous enter...
DXWorldEXPO LLC announced today that ICC-USA, a computer systems integrator and server manufacturing company focused on developing products and product appliances, will exhibit at the 22nd International CloudEXPO | DXWorldEXPO. DXWordEXPO New York 2018, colocated with CloudEXPO New York 2018 will be held November 11-13, 2018, in New York City. ICC is a computer systems integrator and server manufacturing company focused on developing products and product appliances to meet a wide range of ...
Michael Maximilien, better known as max or Dr. Max, is a computer scientist with IBM. At IBM Research Triangle Park, he was a principal engineer for the worldwide industry point-of-sale standard: JavaPOS. At IBM Research, some highlights include pioneering research on semantic Web services, mashups, and cloud computing, and platform-as-a-service. He joined the IBM Cloud Labs in 2014 and works closely with Pivotal Inc., to help make the Cloud Found the best PaaS.
Headquartered in Plainsboro, NJ, Synametrics Technologies has provided IT professionals and computer systems developers since 1997. Based on the success of their initial product offerings (WinSQL and DeltaCopy), the company continues to create and hone innovative products that help its customers get more from their computer applications, databases and infrastructure. To date, over one million users around the world have chosen Synametrics solutions to help power their accelerated business or per...
Dion Hinchcliffe is an internationally recognized digital expert, bestselling book author, frequent keynote speaker, analyst, futurist, and transformation expert based in Washington, DC. He is currently Chief Strategy Officer at the industry-leading digital strategy and online community solutions firm, 7Summits.
In an era of historic innovation fueled by unprecedented access to data and technology, the low cost and risk of entering new markets has leveled the playing field for business. Today, any ambitious innovator can easily introduce a new application or product that can reinvent business models and transform the client experience. In their Day 2 Keynote at 19th Cloud Expo, Mercer Rowe, IBM Vice President of Strategic Alliances, and Raejeanne Skillern, Intel Vice President of Data Center Group and ...
Founded in 2000, Chetu Inc. is a global provider of customized software development solutions and IT staff augmentation services for software technology providers. By providing clients with unparalleled niche technology expertise and industry experience, Chetu has become the premiere long-term, back-end software development partner for start-ups, SMBs, and Fortune 500 companies. Chetu is headquartered in Plantation, Florida, with thirteen offices throughout the U.S. and abroad.
In his Opening Keynote at 21st Cloud Expo, John Considine, General Manager of IBM Cloud Infrastructure, led attendees through the exciting evolution of the cloud. He looked at this major disruption from the perspective of technology, business models, and what this means for enterprises of all sizes. John Considine is General Manager of Cloud Infrastructure Services at IBM. In that role he is responsible for leading IBM’s public cloud infrastructure including strategy, development, and offering m...
From 2013, NTT Communications has been providing cPaaS service, SkyWay. Its customer’s expectations for leveraging WebRTC technology are not only typical real-time communication use cases such as Web conference, remote education, but also IoT use cases such as remote camera monitoring, smart-glass, and robotic. Because of this, NTT Communications has numerous IoT business use-cases that its customers are developing on top of PaaS. WebRTC will lead IoT businesses to be more innovative and address...