Friday, August 4, 2017

A Quick Talk On Generic Cloud App Integration With Informatica Cloud

The Informatica Cloud Integration platform is the industry's complete cloud integration platform as a service (iPaaS). Informatica has been continuously playing as the leader role in this space for the 4th consecutive year (Gartner's margic quadrant for enterprise integration platform). It's definitely your top choice for data integration management.

With the emerging SaaS application on the market, it's essential for integration platform to provide a flexible and efficient connector to integrate various cloud applications in a generic manner. Indeed there are many specific connectors built for individual applications by Informatica (Tier 1 and Tier 2 connectors) or vendors themselves (OEM connectors). But let's face it: most cloud applications evolves quickly on new features and their APIs get updated quicker than you realize. How quickly the connectors can turn around with the most recent API updates? The answer is definitely the connector update lags behind, as most connectors get updated once a year or longer than that.

For what's worth, Informatica introduced the Rest V2 connector in fall of 2016. The V2 connector is an successor of its ancestor which is just named as "Rest Connector". It allows to connect with the latest Rest API of the cloud application directly to read or write data. You can fully leverage it no matter the cloud application is a source, a target or a lookup system. It offers great improvements on flexibility, efficiency, ease of use besides performance. Considering the majority use of Rest API as the industry standard for emerging cloud services, this Rest V2 connector can fill in the role of a generic connector for cloud applications with Rest API. In this article, I will reveal the great features and caveats followed by uncovering some hidden gems of the Rest V2 connector that many may not have realized yet.

One of the great features of the Rest V2 is the introduction of hierarchy data support. As we know, the Rest API deals with either XML or JSON data which are hierarchy data structure. Before the appearance of Rest V2, transformation of hierarchy and relational data is cumbersome and even requiring writing your own script to transform the data as part of the integration process. At the Informatica fall release of 2016, the same time Rest V2 is introduced, the Hierarchy Builder/Parser is brought to life to fill the gap. Hierarchy builder is to build hierarchy data from relational data and hierarchy parse does the opposite way. Let's take a look at how hierarchy builder/parser works.

First, a hierarchy schema needs to be created. It is used by hierarchy builder/parser to present the data structure. To create a hierarchy schema, one will need to have a XML schema file (.xsd) or a JSON sample file (.json). Note the XML schema file is not necessarily provided by the vendor and one has to create it based on the XML payload.

After the creation of hierarchy schema, a hierarchy builder/parser can be used in a data mapping to map hierarchy data to relational data or vice versa.

That seems quite easy, compared to writing your own script to transform. But let's look at the Rest V2 connector as it's even better.

The core of Rest V2 connector configuration is the Swagger support. I have to thumb up on Informatica on the betting of Swagger because of its popularity and great potential as industry standard for API framework. Not all vendor provides swagger definition for their API. But no worry, Informatica provides this tool to you right inside your Informatica cloud web console. It's under Configure -> Swagger files. If you don't see this option, please contact with your Informatica support to enable it. You will need to have Rest V2 connector license to use it.

While creating the Swagger file, you basically defines the API details - URL, verb (POST/GET etc), authentication (username/password, token, etc), Headers, Query parameters, Body and sample response file. A couple of caveats in this process are:

  • The JSON response file is required even you are dealing with XML response. In this case, you will need to convert the XML response in JSON format. There are many online tools to do that.
  • If your raw body is XML format and you have issues creating the Swagger, after making sure nothing is wrong on your end, you may want to contact Informatica support requesting them to create the Swagger for you. Informatica support does have their own internal tool to create swagger file. Informatica will improve the Swagger tool based on my conversation with them and most likely it's no longer an issue soon.

After the Swagger file, create the Rest V2 connection which is quite easy as majority of the info needed for the API is filled in the Swagger file.

Now the Rest V2 Connector is ready to use in the data mapping. The Rest V2 can be used a source to read data or the target to write data in this case. Of course the API you put in the initial Swagger file has to comply with read or write operation. The hierarchy data mapping can be achieved in the Rest V2 Connector as a source/target itself. No need to use hierarchy builder or parser.

To save the best to the last, the great power of using Rest V2 connector is to use it as a midstream. Let's work on a use case: say I need to pull data from A system and push to B system via Rest API. If I were using Rest V2 as the target for B system. At the end of the flow, how do I know the data is successfully taken in B? I will need to start another pipeline or another data mapping task to examine the log file and take action from here. When using it as a midstream, the response of the Rest call can be transmitted to the next step for follow up action in the same streamline flow. This is a differentiator and made me believe all Rest V2 connector should not serve as target but midstream instead.

Another hidden gem on the Rest V2 connector as a midstream, is the capability of processing multiple Rest calls. This is definitely a great sweet feature. For any Rest call, one payload corresponds to one request and generates one response. If you have multiple payloads to be processed in one integration, essentially you need a loop function to loop through individual records and make the Rest call for each record. Without Rest V2 connector, this is impossible as there is no such loop function in Informatica data integration. The alternative would be to use Informatica Cloud Real Time (ICRT) which provides a workflow type of designer that a task can be jumped from one step to another based on certain condition - basically to achieve loop function. With Rest V2 connector, there is no additional configurations needed and it can process multiple records in one step - taking multiple records as inputs, make Rest calls for each record, and generate all response as outputs. While using it as a midstream, we can easily follow up on actions with the Rest call outputs in the same streamline flow.

Wednesday, August 24, 2016

WebCenter Portal and Jdeveloper Version Contrast

For easy reference, here is a list of version contrast between patchset, WebCenter Portal, Jdeveloper with corresponding WebCenter extensions.

(the info is extracted from Oracle doc 1627574.1)

Patchset + Bundle PatchWebCenter PortalJDeveloperWebCenter Extension
dot9 + BP311.
dot9 + BP211.
dot9 + BP111.
dot8 + BP811.
dot8 + BP711.
dot8 + BP611.
dot8 + BP511.
dot8 + BP411.
dot8 + BP311.
dot8 + BP211.
dot8 + BP111.
PS6 + BP111.
PS5 + BP611.
PS5 + BP511.
PS5 + BP411.
PS5 + BP311.
PS4 + BP411.
PS2 + BP311.
PS1 + BP411.
 * Guide to Developing Portals with Oracle WebCenter Portal and Oracle JDeveloper refers to the vesion of WebCenter Portal extensions as (Chapter 2, 2.2.2 Installing the WebCenter Portal Extension for JDeveloper). These extensions were renamed to be in sync with the version of JDeveloper used for Development.

Wednesday, April 20, 2016

Overview of "Profile" and "Folder" in WebCenter Content

In this post, I will go over the basics of two features in WebCenter Content (WCC) - Profile and Folder - and then talk about their own design considerations and usages, and finally discuss their differences.

In WCC, the Profile (a.k.a metadata profile or content profile, to differentiate from user profile) is an approach of metadata modeling. The profile is a powerful tool that can be used to manage the metadata fields to achieve efficient processing of content items, such as check-in, update, search, etc.

Essentially, content profiles consist of set of rules that manage the display of the metadata fields of content items. Through content profiles, you can control what metadata fields should be displayed/hidden, required or not, read-only/editable, initialized with default value or not, and grouped based on the action to the content – whether you are viewing the content information, checking in, updating or searching. Profile provides the capability to customize the user interface of the metadata presented to the users. This is important as the profile can not only improve the user experience but also improve the data quality and accuracy. If a user is presented with too many irrelevant fields when dealing with a content item, the experience could be daunting and very likely the user would not provide quality data input. It doesn't need to take long before your content system filled with more and more irrelevant data and is abused in a sense. Bad profiling could definitely hurt searching experience as well. Consider a profile is defined in a way that captures irrelevant and inaccurate data, the overall search output would be misleading.

One of the frustrations I heard from customers was users wanted multiple profiles on a content item. In WCC, only one profile can be associated with a content item. You may find it a drawback from your unique business use cases or the way you handle the content processing. However, it’s designed in this way for a purpose. You can consider the profile is to define the type of a content item. Any content item belongs to a certain type but not more than one types. The capability of aggregating multiple profiles into one content item could very much likely lead to data abuse and subsequently many other undesired outcomes. Another frustration was “specific resources are required to build or update the profiles”. From what we have discussed, a new profile should be created only if there is a new type of content item required in the system. A profile should stay as static as possible and be updated only if the type of the content item changes with your business context. If the business context doesn’t require any new types of content or any existing types to be retired, the profile should not be built or updated frequently. You may need to look into the initial design and definition of the profiles to match the business needs. Since profile is the approach of metadata modeling, the proper metadata design is essential for daily content management. Metadata in WCC should match your enterprise taxonomy to achieve best outcome on content organization.

Speaking of the number of the profiles, there is no good or bad number. It just needs to fit your business needs. The 50-100 range is the average number of profiles for all US WCC implementations (the statistics is not published anywhere but from a technical summit with Oracle WCC team). The extreme case I have seen with a client is over thousands of profiles in the WCC system and it works just fine. There is a performance caveat with very high number of the profiles in the WCC system. I encountered such performance issue in one of my WCC implementations and the issue has been addressed by Oracle team. For details, please check here.

Folder, in WCC, is a way to structure and organize content items. It’s worthy to note that in WCC the folders are standalone “virtual” structures. Content items are not physically stored in any folder. Every content item in a folder has a metadata field (xCollectionId) to store a numeric folder ID that links the content item to a folder. It behaves like a symbolic link in WCC system.

Content folders offer a conventional hierarchy structure that provides easy access to a content item in WCC. They are just like the directories on your local laptop that point to virtual locations of the content system. With folders in WCC, you can just perform actions like you do in the conventional file system. Quoted in Oracle documentation, “The familiar folder and file model provides a framework for organizing and accessing content stored in the repository. Functionally, folders and files are very similar to those in a conventional file system. You can copy, move, rename, and delete folders and files. You can also create shortcuts to folders or files so you can access a content item from multiple locations in the hierarchy. You can think of the files in the Folders interface as symbolic links or pointers to content items in the repository. The operations you perform in the Folders interface, such as searching or propagating metadata, effectively operate on the associated content items.” 

The hierarchical folder interface is achieved by a component installed in WCC. This component is called FrameworkFolders. It is a scalable enterprise solution and is intended to replace the earlier Contribution Folder interface (called Folders_g component). For a comparison of FrameworkFolders and Folders_g, you can visit this link for more details.
  • There are different types of folders can be used to organize content to fit your different needs. Traditional Folders: it’s the general folder we have discussed that you use to organize your content just like the one you use in your computer.
  • Query Folders: it’s a folder you can create based on a search/query result. It contains collections of document based on the search criteria you defined. You can save the query folder just like you create a regular folder.
  • Retention Folder: it’s a type of query folder with retention rules.


Conceptually, the Folder and Profile are distinct on their functionality and their design purpose. Profile can be considered as a way to define a content “type”. Folder, like the conventional folder in your laptop, is a way to aggregate and organize content. You can store content items with the same profile in the same folder or different folders. You can have a folder containing content items with the same profile or different profiles. I will use an example to better illustrate the usage of the profile and folder.  Say in your company you have the following types of content items: legal document, sales document, and reports. You also have the following departments: HR, IT and Sales. All departments may have their own legal documents and reports. Sales document would almost fall into the sales department not the other two. In this case, you may want to take the content types as the profiles and aggregate the content into folders as per the departments. You don’t want to create profile based on departments because the department can have all kinds of content items and it’s not just one static type. If somehow you define the profile as per department, you will find yourself in a way that has to create/update profiles all the time.

Folder and Profile do reveal some similarities in ways that how content is managed. You can manage content items based on a folder or a profile.  For example, you can search content items either by a folder or a profile; you can batch process content items in a folder or a profile, such as manage workflow, govern security, update content information, etc. On the other hand, folder and profile do have many differences from the way they are designed.
  • Folder, just like a file, can have its own metadata. You can also propagate metadata from a folder to the subfolder and the content items within it. But for profile, it is a way to manage metadata and you cannot apply metadata on top of a profile.
  • Folder has its own security. Each folder has an owner who can modify its metadata and delete it if needed. But the folder owner doesn’t have any additional privileges over the content items inside the folder. Profile has little to do with security directly. But since Profile is to manage the metadata, it could manage profile indirectly. The “Security Group” and “Account” metadata can be used to manage security of a content item.
  • With folder, you can perform basic content retention scheduling by creating a retention query folder, assigning retention attributes to the folder, and then configuring the retention schedule. There is also a specific folder type – retention folder in WCC – which is based on query folder with rules for content retention. Since Profile is to manage the metadata, it has little to do with retention directly.
  • In workflow, actions can be applied on top of content items either within a folder or associated with a profile. In this perspective, folder and profile have similar effects.
  • WCC doesn’t have standalone tagging service. But you can create custom metadata for this purpose. Folder has its own metadata, so you can apply the custom tag on top of a folder. Profile, again, as a way to model metadata, can be used to manage any metadata field, including the tagging.
In a quick summary, Profile and Folder are two different concepts in WCC. Although they may reveal some level of similarities in how content can be managed, their design basis are quite distinct. Profile can be considered as a way to define the "type" of a content item and provide a customizable user interface for users to manage their content. Folder provides a virtual hierarchical structure just like the conventional file system in your computer to help to organize and manage content. They should be used and designed as per the essentials of their function, to avoid inefficient content management.

Monday, April 11, 2016

Build A Concurrent Web Page With A Simple Example

Today, by using a simple example, I am going to demonstrate on how to build a concurrent web page in ADF using Jdeveloper.

From time to time, I see requirements from various contexts with such needs to build concurrencies on web pages. The performance can be greatly benefited since multiple processes are executed at the same time so that the response time is greatly improved. It's also very suitable for the use cases that the processes are not associated with any user interface interactions, such as sending an email to user after performing some action but the web page does not need to know if/when the email is sent. Such processes can be executed in a separate thread from the main thread dealing with the web page rendering and other user interactions. Even if you don't have hard requirement on such concurrency needs, why waste it? Nowadays, all servers are built with multi-core computation capabilities and designed for parallel processing. In years the speed of the CPU is not getting much faster but the number of the processing cores are getting more. If you are not taking advantage of it, it would be your waste.

Implementing such concurrency in Java EE application does not seem to be complicated, especially considering the fact that Java concurrency has been introduced since Java 1.5. But, such implementation seems to be somewhat formidable to the development team from my experience. The reason could be without careful handing and design of the Java concurrency, your expected performance gain could lead to an (much worse) opposite degradation. It could also be because there is a lack of examples to demonstrate such use case and that's definitely the purpose of this post to compensate it.

In this example, I have a page built in ADF with 3 sections - left, central and right section. Each section has their independent content and requires different period of time to process. Let's say the left will take 2 seconds, central will take 3 seconds and right will take 5 seconds to return the data.

I will start with the default approach (no concurrency) followed by two others on how to improve our web page response time.

The default

There is no concurrency here. Everything runs sequentially. My page looks like this:

Since everything runs in sequence, so the response time takes 2 (left) + 3 (central) + 5 (right) = 10 seconds minimal. It actually take 10.59 seconds - not a surprise.

If you are wondering what's the code behind the scene on the content rendering, here it is:

Now it's time to explore the better solution.

The Better

In this approach, I would use ExecutorService to mange the creation and termination of the Java threads to perform the parallel processing. Since there is 3 sections in the page that I need to run in parallel, I will create the thread pool with 3 threads. Then I would need to submit individual tasks to the executor service to perform. The task is basically the work unit that needs to be done for each section. Since our task needs return values, I am going to use Callable interface to implement the task. We can get a reference of the task submission which is the Future object. With the Future object, we can get the output of the task when it completes.

CentralContentTask codes -

As you can see, I only need to put the task code into the call() method that I need to implement which is defined by Callable interface.

In the managed bean to render the web page, here is the logic that used to manage threads by ExecutorService - task submission, termination, output retrievemcent, etc.

Please note: it is very important to shut down the ExecutorService after task submission. Otherwise the memory leak would happen. There is no need to wait until the task complete to shut down. You can also shut down the executor service immediately regardless of task complete status by using shutDownNow() but that's not we are going to pursuit.

After, we get the output of the task by get() on the Future object. This method will make the thread to wait until the task complete and return the output.

Let's run the page and see our response time -

It takes 5.37 seconds to load the page. As we are running 3 processes in parallel, the total time will be the one with the longest processing time, which is 5 seconds.

It's much better. But not enough? Can we push all 3 processes to the back without interfering the page rendering?

The Ultimate

The idea is to separate the main thread which is rendering the web page from the parallel processing the 3 sections. It's fairly simple and we actually have already done that in previous code. The only thing blocking the main thread is using the Future.get() to return the output of the task. We will need to avoid that in the main thread. So the question becomes how could we retrieve the output of the task  that takes a period of time and push back to the web page we already rendered?

The answer is the ADF poll component. ADF poll can be used to various use cases that require pushing data to web pages after rendered. Here is how to leverage it in our use case.

First, we add a <af:poll> on the web page with an interval 1 second and makes all 3 sections listen to this component.

Second, we construct the refresh() method defined in the poll component to retrieve the output of the tasks.

Here we use Future.isDone() to check if the task is complete or not before getting the output, so that the thread is not waiting if the task is not complete yet.

Another note is after all 3 processes complete, we will need to stop the poll event. ADF poll offers a property "timeout" to expire the poll after a period of time. But at this time, this feature doesn't work well in ADF 11g including This is a bug already filed but very likely it will NOT be fixed in ADF 11g. To disable the poll, we will use a workaround here to reset the poll interval to "-1" (negative) which will automatically disable the poll.

Let's take a look how long the page takes to render -

Yes, 407ms!

Now all 3 processes are running behind the page and when they are ready, it's pushed to the page individually. If you watch the page, you will see the left content comes first, then the central content and finally the right content.

In most time, the user perception is everything. In our last approach, we are able to give the user the page rendering with no time and gradually add the content when they are ready. It is an ultimate scenario. In cases that the user doesn't want to see the empty content to start with, we can put the process of the desired content back to the main thread with the page rendering, so that the web page always shows the desired content at the user's first sight. In that case the total response time will be a few hundreds of milliseconds plus whatever the process of the desired content will take.

The sample example demoed in the post can be downloaded here. It's built in Jdeveloper