Featured FREE Whitepapers

What's New Here?


Telecommunications for Dummies – Telecom Basics and Introduction to BSS

Introduction This post is intended to be a crash course for beginners who wish to understand at a “broad level” how Business Support Subsystem components work in a telecom carrier’s network and more importantly how they connect to the telecom network elements over standard protocols.Hence, the text is more “conversational” in nature rather than completely “technical”. Elementary examples of simple call setup procedures have also been included, as they are required for a holistic understanding of the entire ecosystem – beginning from a call setup, to its charging, rating and billing.   Background Telecom operations from the perspective of a carrier are divided into two broad categories: OSS and BSS. Operations Support Subsystems (OSS) is an aggregation of functions that enable the operator to provision and manage their inventory, customers, services and network elements. Provisioning implies “making an entry or a record” of some resource. This resource may be a customer, a value added service, inventory such as dongles or even a network element such as a call control switch. For example if a new customer is acquired, his entire record has to be saved in the network such as his name, address, phone number allotted etc. This is customer provisioning. An example of managing an inventory could be as follows. Dongles procured by the carrier need to be recorded in the inventory management system. Each dongle would undergo a lifecycle. Eg: When the dongle is first procured, it is recorded to be in stock. When the dongle is dispatched to a retail store, its status is changed in the inventory management system to be “for sale”. When a customer purchases the dongle, its status changes to “sold” and it is linked to the customer’s identity who bought it. On the other hand, Business Support Subsystems is an aggregation of functions which are used for managing an operator’s day-to-day business characteristics and providing an operator with complete clarity on the performance and management of his diverse lines of businesses. BSS systems include real-time functions such as charging (post-paid/pre-paid), rating using tariff plans and applying discounts. BSS also includes some non-real-time components such as launching new marketing campaigns, analyzing the success of individual marketing campaigns through business intelligence reports, partner management (for dealing with 3rd party content providers, other operators, franchisees etc), creating invoices, payment collection, revenue sharing with partners and billing. This note is more focussed on the BSS components of an operator’s ecosystem. Before going to the specifics of BSS, it is useful to review a typical call setup procedure for post paid calls as well as for pre paid calls, so that some basic grounding is available regarding the call flows in a typical telecom network:?? The above scenario depicts the flow of messages for a pre-paid subscriber. When the subscriber dials out a number, the local office checks whether the subscriber has enough balance in his/her account. Only if the balance is available, does the call proceed further.   Case Study:NOTE-1: There may be one or more transit exchanges between the originating local office and the terminating local office. NOTE-2: The messages that flow between the local exchange and the pre-paid platform are in INAP protocol. INAP stands for Intelligent Network Application Part and is the top most layer of the SS7 protocol stack. Both the local office and the service control point (SCP) understand the INAP messages. NOTE-3: The other levels of the SS7 stack are: MTP1, MTP2, MTP3, SCCP and TCAP. MTP stands for Message Transfer Part. SCCP stands for Signalling Connection Control Part. TCAP stands for Transaction Capability Application Part. Main steps in charging a pre-paid call in a fixed line network:When a pre-paid subscriber makes a call, it arrives at a local exchange (also called local office in the US). The local exchange looks at the calling number (A-party), and finds that it is a call made by a pre-paid customer. So, it cannot connect the call straight away as it would have done for a post –paid customer. The local exchange has to now talk to a pre-paid platform. The local exchange sends a message to the pre-paid platform informing it of the A-party number and the B-party number. This message is received by a component of the pre-paid platform called the Service Control Point (SCP). SCP is nothing but a software module that understands and receives messages sent by the local exchange. On receiving the message from the local exchange, the SCP communicates with a Rating Engine, which is the second component of the pre-paid platform. The rating engine is the heart of the pre-paid platform. As tariff plans get more complex and more services need to be charged, the rating engine must have flexibility and functionality to meet the business requirements. It contains in its database, the current balance of all the pre-paid customers. Since there are hundreds and thousands of customers served by a single pre-paid platform, the rating engine will contain a large number of customer balance records. The rating engine looks at the A-party number and then queries its database to find out the current balance of the A-party. If the A-party has zero or insufficient balance, then it must inform the SCP that the calling customer has insufficient balance. The SCP in turn will send a message to the local exchange informing it that the calling party has insufficient balance. The local exchange rejects the call. If however, the rating engine finds that the calling party has sufficient balance, then via the SCP, the local exchange is informed that the call can proceed. If the balance of the calling party is zero, then no further analysis is required by the rating engine to reject the call. However it has to check by looking at the B-party number if the balance is sufficient to support the minimum rate of the call pulse. For example: If a local call costs 10 cents per minute and the balance of the A-party is currently 5 cents, then this balance is not sufficient. If it was an international call, then even 50 cents balance may not be sufficient. Assuming that the balance was sufficient and the pre-paid platform has allowed the call to proceed, then the local exchange will allow the call to proceed and the call will be eventually connected via one or more exchanges to the called subscriber whose phone will ring. If the call is abandoned for any reason like the calling party abandoning the call before it is answered by the B-party, no answer by B-party etc, then the local exchange must again send a message to the SCP informing it that the call has not matured. The rating engine had at the time of allowing the call reserved a certain amount from the subscriber’s balance. For example the amount that would have been reserved depends upon the tariff plan of the customer. To avoid frequent requests for the reservation of amount, the rating engine may be asked to grant a larger chunk of units from the balance. As an example, assume that the calling party had a balance of USD 100 when the call was made. When the message reaches the pre-paid platform, the rating engine on finding that there is enough balance will not only respond to the local exchange about the sufficiency of the balance as described above, but also reserve an amount, say USD 5. Therefore, temporarily the balance reduces to USD 95. It is important to understand that reserving an amount is not equivalent to actually reducing the balance of the calling party. At this time, the reservation only signifies the “intent”, and actual deduction would happen only if the call is successful and usage takes place. When the call is answered, the deduction will happen periodically from the reserved amount of USD 5. How much amount must be deducted and at what periodicity? This has to ultimately depend upon the tariff plan which the calling party has subscribed to. Assuming that the call charges are 50 cents per minute, this reserved amount of USD 5 would be enough for 10 minutes of conversation. This is the validity time. At the time of informing the local exchange that the calling party has sufficient balance and the call can be connected, the validity time is also conveyed by the SCP. Thus, the local exchange knows that if this call matures, it can let the call continue for the first 10 minutes without again contacting the pre-paid platform. If the conversation ends is less than 10 minutes, for example the call duration is just 3 minutes, the local exchange sends a message to the charging platform and only USD 1.5 will be deducted by the pre-paid platform at the end of the call. The remaining USD 3.5 is returned to the account balance of the calling party. His closing balance will now be USD 8.5. Assuming that the conversation exceeded 10 minutes, the local exchange sends another request to reserve more units. This has to be done prior to the expiry of the validity time. The pre-paid platform will reserve another chunk of USD 5 and inform the local exchange accordingly. It may happen that the unit reservation request fails. This will happen when the subscriber has no units left in his balance. In such a scenario, the pre-paid platform will respond with a message that informs the local exchange that the credit limit is reached. The local exchange will then immediately terminate the call.The architecture for charging a pre-paid call in a mobile network (GSM) is shown below. Note that it is very similar. Only the INAP protocol layer has been replaced by CAMEL:   NOTE-1: CAMEL stands for Customised Applications for Mobile Enhanced Logic. NOTE-2: There will be a base station and the base station controller (BSC) before the call reaches the originating MSC. Likewise, there will be a BSC and a BTS on the terminating side. NOTE-3: There may be one or more transit exchanges between the originating and terminating MSCs. These are called GMSCs (Gateway MSCs) in mobile networks. NOTE-4: The architecture in a CDMA mobile network will be identical except that CAMEL will be replaced by IS-826 protocol layer.   Details of the SMP and the VMS: Service Management Point (SMP): The SMP performs the management functions for the IN platform. For example, all the pre-paid customers must be recorded in the IN platform along with their initial balance. This is typically called the provisioning of customers. This task is performed by the SMP. The SMP will typically have local terminals where operators can do this task through a set of commands called the command line interface (CLI). There may also be a need to periodically delete (de-provision) customers who have left the pre-paid service provided by the operator. It will also be connected to a voucher management system from where commands will flow to the SMP for bulk provisioning of customers so that a large number of customers can be simultaneously provisioned in the pre-paid system. Voucher Management System: The Voucher Management System (VMS) allows prepaid subscribers recharge their accounts using pre-paid vouchers, allowing operators to run widely distributed dealer networks efficiently and easily. VMS allows subscribers to add funds to their accounts within the home network or in roaming networks regardless of location, time of day or model of phone. VMS cuts down administrative, bookkeeping and servicing expenses and makes it easy for subscribes to keep track of their account balance without service centres. Features of a VMS are as follows: Account recharging via voice services The Voucher Management System allows subscribers to recharge their accounts from their own telephone as well as from any tone-dial phone. Account recharging via non-voice services Subscribers can top up their own accounts, as well as the account of any other subscriber of the same network, by sending an SMS request with a PIN code to a service number. Accounts can also be recharged from a special websites of the carrier. Account recharging via operator Subscribers can recharge their accounts via call centre operators. Distributed system The prepaid card system allows distributed systems with card roaming to be built, giving your subscribers the ability to recharge their accounts while in roaming. Blocking The Voucher Management System can bar a specific telephone number from accessing the service after several consecutive unsuccessful attempts to enter a PIN code from that number, protecting you from fraud. Advertising Additional profit can be generated from placing adverts on cards, and can also be a useful marketing tool for the operator, for example to advertise new services. Card Status Information about card usage (transactions made) is recorded automatically in the operator’s billing system. Revisiting Post paid Charging: Charging for Post-paid subscribers: For post-paid subscribers, there is no need to check for the balance. The calls are allowed to proceed further, without any messaging to a pre-paid platform, and the call details are captured in the form of a Call Detail Record (CDR) at the end of the call. These CDRs are used later for billing purposes. A typical CDR file captures the following information: – Calling party number – Called party number – Call start time – Call end time – Call duration (End Time – Start Time) – Call Identifier This information is saved in the form of a file in the local exchange or the MSC and is pushed to the Charging System for post processing.The charging system has a rating engine and a mediation engine. The rating engine analyzes the CDR files and determines the rate to be applied for each CDR file. Once this rating is completed, the CDR is known as a rated-CDR. This rated CDR is pushed to the mediation engine. The mediation engine is required because it may receive CDRs from more than one source and the format of the CDRs may be different from each source. The mediation engine post-processes the rated CDRs from multiple sources and reformats them into a common file. This common file is then pushed to the billing system which generates the itemized bills for each customer based on the CDRs. A simple procedure for a post-paid subscriber is given below in terms of CDR creation and storage:We have so far described charging of voice calls. The figure below shows the post-paid charging of a data call in CDMA:The data call path bifurcates from the PCF – Packet Control Function, which is a logical function of the Base Station Controller as shown above. The Rating, Mediation engine and billing system are common for data calls as well as voice calls. The figure below shows the corresponding architecture for charging a Mobile WiMAX data call. The only difference here is, that we have an ASN gateway as the controller for data calls. The AAA is sending charging events on RADIUS protocol.The AAA (Authorization, Authentication and Accounting) server receives the messages on RADIUS protocol. There may be a dedicated RADIUS to DIAMETER converter in the AAA server, which converts these messages to the DIAMETER protocol and sends them to the charging platform. DIAMETER protocol’s Rf interface is used for post-paid charging while the DIAMETER protocol’s Ro interface is used for pre-paid charging. Introduction to the Realtime Components of BSS: We can now discuss the BSS system in more detail. There may be some overlap of information with what is described above. The figure below shows a “modern” BSS system which works on DIAMETER interfaces and integrates with IMS, Mobile WiMAX and LTE on DIAMETER protocols:LEGEND:This is only an architectural representation meant to show some of the major components of a BSS system and to provide a glimpse in to the complexity of a modern BSS platform. However, we will be discussing conceptually some of the basic components of the BSS system without going into the technical details. Some of the most important real-time network facing components of a BSS system are:Charging Rating Mediation Billing, and Reconciliation1. Charging Charging of customers can be done in two ways: pre-paid (online) charging and post-paid (offline) charging. The first step of revenue generation starts with the charging process, where the network elements identify the events which need to be charged. Some examples include – sending a SMS, MMS, dialling a number, downloading content etc. Network elements such as the local exchange (office), SMSC (Short Message Service Centre) etc generate these “chargeable” events towards the charging platform. Chargeable events can be conceptualized as “intents” to charge the customer based on his/her actions. At this stage, it is not guaranteed that these events will actually lead to revenue realization. For example, a chargeable event will be generated for a missed call, or a failed call setup. However, as the call never got connected successfully, the customer will not be liable for paying for this event. However, the operator may change his policy and may charge the customer even for missed calls if needed. Interfaces for Charging: Charging platforms support multiple interfaces to receive events. Some of the most common charging interfaces are on RADIUS and DIAMETER protocols. Legacy systems used to charge calls on the IN (Intelligent Network) pre-paid platform. There were also Web Service interfaces used for charging in some legacy systems. Web services work on the HTTP protocol and are described by SOAP protocol (Simple Object Access Protocol).   2. Rate Determination and Rating. The next logical step after receiving charging events is Rate Determination and Rating. Once the charging system has received an event, rate calculation takes place for the customer depending upon the current tariff plan in use. The charging system needs to decide the amount of units which need to be consumed for the particular charging event. For post-paid subscribers, these units are determined and attached to a Call Detail Record (CDR). Then this “rated-CDR” is dumped as a file. For pre-paid subscribers, the units are determined and compared with the subscriber’s account balance. If the account balance is sufficient, the charging system borrows the required amount of units from the subscriber’s account and prepares to deduct them from the available balance. Units may have different manifestations. For example, we can represent Units as “money”. We may also represent units in the form of the allowable limit for data download in Megabytes (units measured by volume). Another possibility is to measure units in terms of talk time. In some special cases, the operator may also represent units in terms of the number of free calls offered and the number of free SMS messages allowed. Before the charging system rates a particular charging event and actually deducts units, there is an intermediary step. This step is to apply discounts if applicable. For example, if the charging event cost the subscriber $1 and the tariff plan of the subscriber offered him with a discount of 50% on all calls, then the rate applicable would only be 50 cents. At this stage, it is important to understand the relationship between a tariff plan, the components of a tariff plan, offers attached with the tariff plan, rate to be applied based on the offers, and the validity semantics of the offer. These are important inputs to the rating engine. The final rate to be applied is calculated based on the tariff plan, offers and discounts applicable on the base plan and the validity of the offers. The illustration below explains the relationship between these concepts:The figure above represents a tariff plan. This plan has four components:Voice Messaging Data Content.The voice component applies to the usual calls that the subscriber initiates. The messaging component of this example refers to the SMS messages. The Data component refers to the internet browsing and downloads from websites. Finally, the content component refers to the purchase of premium content such as movies, ringtones and ebooks from an operator’s application store for example. Each component of this tariff plan has an offer attached to it. In this example, the customer has the first 100 voice calls free, the first 50 SMS messages free, first 10 MB of downloads free and 2 ringtones free. These offers have validity applicable to them. The first 100 voice calls are free and must be consumed within a month. Similarly, the free SMS offer must be consumed within a month. The same is applicable to the data download offer. However, the 2 ringtones are free as a onetime offer to the customer. The customer may download at the most 2 ringtones for free. For the 3rd ringtone download, normal charges shall apply. The last concept is that of usage counters. At the top of each month, the free voice call quota is re-initialized to a value of 100. With each call made by the customer, this value of 100 is decremented until it reaches zero. Once the free call quota is zero, normal charges apply to all subsequent calls. Similarly, for messaging the usage quota is set to 50 every month. Validity can be represented in several forms:  The validity semantics of an offer has to be very flexible. This flexibility determines the business agility of an operator. In the figure above, certain validity options are shown. For example:A particular offer may be valid at the top of each month. This is similar to the voice call free usage counters discussed in the example earlier.In some other scenarios, a certain offer may be valid only on certain days of the week (Monday, Wednesday, and Friday). For example all voice calls are discounted 30% on certain days of the week (weekends for example).Another option is for an offer to be applicable within a time band of a day. For example, the operator may offer discounted call rates at night time (between 11 pm and 5 am) to encourage usage during these lean hours.The fourth option shown above is for an offer to be applicable only on some special days such as Diwali, New Year, and Christmas etc.The validity of an offer can also be confined to a closed user group. For example, all calls made between users belonging to a closed user group will be cheaper as compared to the calls made outside the group.Some offers are valid based on the destination of the called party. For example, there may be a special rate applied when calls are made between New York and Chicago.Usage based validity is a very interesting case. For example, when the usage of a particular customer crosses $30 in a given month, all subsequent calls for that month are offered at a discounted rate of 25%. This is a kind of a “reward” given to high ARPU customers.The last option for validity is “unconditional”. This means that the offer in question will be valid at all times and for all calls.3. Mediation: The mediation process is the next logical step after rating. Once the rate is determined as explained in the previous section, it is applied to the CDR file. Now, this rated CDR file is pulled by the mediation engine. The mediation engine can handle CDRs from various sources – The MSC, the local office, the SMSC etc. It converts all these CDR files into a common file format. The file format is chosen so that it is understandable by the billing system. This file is then provided to the billing system for post-processing.   4. Billing: In the billing process, CDRs are provided by the Mediation engine to the billing system. The billing system processes these CDRs and reads them. The output of this process is the generation of an itemized bill in human readable form by the billing system. As the CDRs which reach the billing system are already rated, the billing system can calculate the final bill for a subscriber and the rate applied for each charging event. The details of the CDR are presented in the form of a bill which can then be e-mailed or dispatched to the subscriber’s billing address. These days, some advanced billing systems also provide a feature of billing level discounts. For example, if there is a high paying customer who generates a bill of over USD 100 every month for 6 months consecutively, the billing system can provide this customer with a discount of 10% on his next bill. The billing system may even ask the rating engine to provide this high ARPU customer with a quota for free calls from next month onwards. Wholesale/Interconnect Billing: Wholesale billing applications include a variety of capabilities. Traditionally, this area included inter-carrier settlements capabilities and this was later extended to interconnect billing applications. In today’s competitive markets and complex value chains, it has expanded further to include among others Roaming, wholesale operators, and resellers, Mobile Virtual Network Operators, Content Providers and E-Commerce. There is now an array of applications in the area providing charging, billing, and settlement capabilities on a raw data basis, individual transaction basis and bulk basis across a variety of services and platforms. These applications work across a variety of platforms and support a wide range of services, preferably in one single system.   5. Reconciliation: Reconciliation is an offline process of sorting out the CDRs of the calls made in a certain period to determine the following information:How many calls were made to another carrier’s network? How many calls made to another carrier’s network were local/regional/international? How many calls from other networks terminated in my network? How many calls made from a 3rd party network used my network as a transit and they actually terminated in some other network? Similar details for SMS messages and MMS messages are also post processed.Based on such information, the operator makes a record of the amount of revenue which has to be shared with other network operators. This revenue sharing takes place because all these operators share a point of interconnect (POI) to each other’s networks and they allow calls to be made from one network to another. Due to this, the operator arrives at an amount which other operators need to pay him, and an amount which he needs to pay to other operators for providing interconnect services to each other. This process is known as reconciliation.   Business Perspective and an introduction to the non-realtime components of BSS: Business teams of a carrier are responsible for designing and planning tariff plans which are later launched in the form of marketing campaigns. Each tariff plan needs to be designed carefully, keeping in mind the inputs obtained from competitive analysis of the tariff plans of other carriers. The BSS component which manages the marketing campaigns for the carrier is called “Campaign Management”. The success or failure of a particular marketing campaign is gauged by another component known as “Business Intelligence” or BI. The Business Intelligence engine provides the business teams with clarity on the performance of a particular marketing campaign based on various parameters. An example is provided below:Popularity of a particular marketing campaign (tariff plan) based on market segments such as children, young executives etc.Popularity of a campaign based on the region – Florida, California, New Jersey etc.Popularity of a campaign based on age.Popularity of a campaign based on market segment – High ARPU customers, Customers with traditionally low ARPU, mid ARPU customers, enterprise customers etc.Some of the other major functions of a BSS system which is critical from the viewpoint of a network carrier are: Revenue Assurance: The main revenue assurance application areas are:Detection of data discrepancies – Detection of data discrepancies between systems and data repositories that might affect the ability to generate revenues or increase costs, including,Configuration data – e.g. between Customer Relationship Management systems, inventory and network) Events data – e.g., between the call Switch, Mediation, and billing) Interconnect/partners billingDetection of data integrity and correctness problems, e.g., a switch that rounds incorrectly the durations of the call. Rating and Billing Verification – Verification of the correctness of the application of the rating and billing rules – e.g., is the customer billed according to the correct plan, and is he billed correctly according to the correspondent plan. Investigation of revenue leakages, finding and correcting their root cause to prevent the recurrence of similar leakages Grouping and classification of leakages Equipment and system testing – Proactively test equipment and systems and processes to verify that they provide accurate information- e.g., using test call generation Trouble Reports and Alarms – Generation and tracking of Revenue Assurance Trouble Reports and Alarms Automation of revenue assurance controls and data collection Automation of leakages correction Generation of revenue leakage reports and documentation both for internal needs as well as a support to regulatory compliance activities.Fraud Management: Investigating, preventing and responding to activities that indicate fraudulent use of networks or systems. This is achieved by effective Fraud Management systems coupled with the instrumentation and monitoring that enables potential fraudulent activities to be identified. There are close linkages between fraud identification and anomaly detection. Eg:Someone fraudulently tops up the pre paid account from the system backend without using a voucher. Someone reduces the value of the pos paid bill just before it is processed by the billing system. Someone deletes the CDRs or reduces the rate on each CDR.Conclusion: The domain of BSS is huge and it is not possible to cover all aspects in a single post. However, this post provided an introduction to some of the important concepts of this area and how the BSS connects to network elements of a telecom system over standard interfaces. We also discussed some of the most critical concepts of a BSS platform which directly influence the operator’s revenue realization capabilities – such as rating, charging, billing, reconciliation, revenue assurance and fraud management. Reference: Telecommunications for Dummies – Telecom Basics and Introduction to BSS. from our JCG partner Aayush at the Aayush: weblog blog....

FreeChart with Groovy and Apache POI

The point of this article is to show you how to parse data from an Excel spreadsheet that looks like this:and turn it into a series of graphs that look like this:Recently I was looking for an opportunity to get some practice with JFreeChart and ended up looking at a dataset released by the Canadian government as part of their ‘Open Data’ initiative. The particular set of data is entitled ‘Number of Seedlings Planted by Ownership, Species’ and is delivered as an Excel spreadsheet, hence the need for the Apache POI library in order to read the data in. As is fairly usual, at least in my experience, the Excel spreadsheet is designed primarily for human consumption which adds a degree of complexity to the parsing. Fortunately the spreadsheet does follow a repetitive pattern that can be accounted for fairly easily, so this is not insurmountable. Still, we want to get the data out of Excel to make it more approachable for machine consumption so the first step is to convert it to a JSON representation. Once it is in this much more transportable form we can readily convert the data into graph visualizations using JFreeChart.The spreadsheet format Excel as a workplace tool is very well established, can increase individual productivity and is definitely a boon to your average office worker. The problem is that once the data is there it’s often trapped there. Data tends to be laid out based on human aesthetics and not on parsability, meaning that unless you want to use Excel itself to do further analysis, there’s not a lot of options. Exports to more neutral formats like csv suffer from the same problems- namely that there’s no way to read in the data coherently without designing a custom parser. In this particular case, parsing the spreadsheet has to take into account the following:Merged cells where one column is meant to represent a fixed value for a number of sequential rows. Column headers that do not represent all of the actual columns. Here we have a ‘notes’ column for each province that immediately follows its’ data column. As the header cells are merged across both of these columns, they cannot be used directly to parse the data. Data is broken down into several domains that lead to repetitions in the format. The data contains a mix of numbers where results are available and text where they are not. The meanings of the text entries are described in a table at the end of the spreadsheet. Section titles and headers are repeated throughout the document, apparently trying to match some print layout, or perhaps just trying to provide some assistance to those scrolling through the long document.Data in the spreadsheet is first divided into reporting by Provincial crown land, private land, Federal land, and finally a total for all of them. Within each of these sections, data is reported for each tree species on a yearly basis across all Provinces and Territories along with aggregate totals of these figures across Canada. Each of these species data-tables has an identical row/column structure which allows us to create a single parsing structure sufficient for reading in data from each of them separately.Converting the spreadsheet to JSON For parsing the Excel document, I’m using the Apache POI library and a Groovy wrapper class to assist in processing. The wrapper class is very simple but allows us to abstract most of the mechanics of dealing with the Excel document away. The full source is available on this blog post from author Goran Ehrsson. The key benefit is the ability to specify a window of the file to process based on ‘offset’ and ‘max’ parameters provided in a simple map. Here’s an example for reading data for the text symbols table at the end of the spreadsheet. We define a Map which states which sheet to read from, which line to start on(offset) and how many lines to process. The ExcelBuilder class(which isn’t really a builder at all) takes in the path to a File object and under the hood reads that into a POI HSSFWorkbook which is then referenced by the call to the eachLine method. public static final Map SYMBOLS = [sheet: SHEET1, offset: 910, max: 8] ... final ExcelBuilder excelReader = new ExcelBuilder(data.absolutePath) Map<String, String> symbolTable = [:] excelReader.eachLine(SYMBOLS) { HSSFRow row -> symbolTable[row.getCell(0).stringCellValue] = row.getCell(1).stringCellValue } Eventually when we turn this into JSON, it will look like this: 'Symbols': { '...': 'Figures not appropriate or not applicable', '..': 'Figures not available', '--': 'Amount too small to be expressed', '-': 'Nil or zero', 'p': 'Preliminary figures', 'r': 'Revised figures', 'e': 'Estimated by provincial or territorial forestry agency', 'E': 'Estimated by the Canadian Forest Service or by Statistics Canada' } Now processing the other data blocks gets a little bit trickier. The first column consists of 2 merged cells, and all but one of the other headers actually represents two columns of information: a count and an optional notation. The merged column is handled by a simple EMPTY placeholder and the extra columns by processing the list of headers;. public static final List<String> HEADERS = ['Species', 'EMPTY', 'Year', 'NL', 'PE', 'NS', 'NB', 'QC', 'ON', 'MB', 'SK', 'AB', 'BC', 'YT', 'NT *a', 'NU', 'CA'] /** * For each header add a second following header for a 'notes' column * @param strings * @return expanded list of headers */ private List<String> expandHeaders(List<String> strings) { strings.collect {[it, '${it}_notes']}.flatten() } Each data block corresponds to a particular species of tree, broken down by year and Province or Territory. Each species is represented by a map which defines where in the document that information is contained so we can iterate over a collection of these maps and aggregate data quite easily. This set of constants and code is sufficient for parsing all of the data in the document. public static final int HEADER_OFFSET = 3 public static final int YEARS = 21 public static final Map PINE = [sheet: SHEET1, offset: 6, max: YEARS, species: 'Pine'] public static final Map SPRUCE = [sheet: SHEET1, offset: 29, max: YEARS, species: 'Spruce'] public static final Map FIR = [sheet: SHEET1, offset: 61, max: YEARS, species: 'Fir'] public static final Map DOUGLAS_FIR = [sheet: SHEET1, offset: 84, max: YEARS, species: 'Douglas-fir'] public static final Map MISCELLANEOUS_SOFTWOODS = [sheet: SHEET1, offset: 116, max: YEARS, species: 'Miscellaneous softwoods'] public static final Map MISCELLANEOUS_HARDWOODS = [sheet: SHEET1, offset: 139, max: YEARS, species: 'Miscellaneous hardwoods'] public static final Map UNSPECIFIED = [sheet: SHEET1, offset: 171, max: YEARS, species: 'Unspecified'] public static final Map TOTAL_PLANTING = [sheet: SHEET1, offset: 194, max: YEARS, species: 'Total planting'] public static final List<Map> PROVINCIAL = [PINE, SPRUCE, FIR, DOUGLAS_FIR, MISCELLANEOUS_SOFTWOODS, MISCELLANEOUS_HARDWOODS, UNSPECIFIED, TOTAL_PLANTING] public static final List<String> AREAS = HEADERS[HEADER_OFFSET..-1]...final Closure collector = { Map species -> Map speciesMap = [name: species.species] excelReader.eachLine(species) {HSSFRow row -> //ensure that we are reading from the correct place in the file if (row.rowNum == species.offset) { assert row.getCell(0).stringCellValue == species.species } //process rows if (row.rowNum > species.offset) { final int year = row.getCell(HEADERS.indexOf('Year')).stringCellValue as int Map yearMap = [:] expandHeaders(AREAS).eachWithIndex {String header, int index -> final HSSFCell cell = row.getCell(index + HEADER_OFFSET) yearMap[header] = cell.cellType == HSSFCell.CELL_TYPE_STRING ? cell.stringCellValue : cell.numericCellValue } speciesMap[year] = yearMap.asImmutable() } } speciesMap.asImmutable() } The defined collector Closure returns a map of all species data for one of the four groupings(Provincial, private land, Federal and totals). The only thing that differentiates these groups is their offset in the file so we can define maps for the structure of each simply by updating the offsets of the first. public static final List<Map> PROVINCIAL = [PINE, SPRUCE, FIR, DOUGLAS_FIR, MISCELLANEOUS_SOFTWOODS, MISCELLANEOUS_HARDWOODS, UNSPECIFIED, TOTAL_PLANTING] public static final List<Map> PRIVATE_LAND = offset(PROVINCIAL, 220) public static final List<Map> FEDERAL = offset(PROVINCIAL, 441) public static final List<Map> TOTAL = offset(PROVINCIAL, 662)private static List<Map> offset(List<Map> maps, int offset) { maps.collect { Map map -> Map offsetMap = new LinkedHashMap(map) offsetMap.offset = offsetMap.offset + offset offsetMap } } Finally, we can iterate over these simple map structures applying the collector Closure and we end up with a single map representing all of the data. def parsedSpreadsheet = [PROVINCIAL, PRIVATE_LAND, FEDERAL, TOTAL].collect { it.collect(collector) } Map resultsMap = [:] GROUPINGS.eachWithIndex {String groupName, int index -> resultsMap[groupName] = parsedSpreadsheet[index] } resultsMap['Symbols'] = symbolTable And the JsonBuilder class provides an easy way to convert any map to a JSON document ready to write out the results. Map map = new NaturalResourcesCanadaExcelParser().convertToMap(data) new File('src/test/resources/NaturalResourcesCanadaNewSeedlings.json').withWriter {Writer writer -> writer << new JsonBuilder(map).toPrettyString() }Parsing JSON into JFreeChart line charts All right, so now that we’ve turned the data into a slightly more consumable format, it’s time to visualize it. For this case I’m using a combination of the JFreeChart library and the GroovyChart project which provides a nice DSL syntax for working with the JFreeChart API. It doesn’t look to be under development presently, but aside from the fact that the jar isn’t published to an available repository it was totally up to this task. We’re going to create four charts for each of the fourteen areas represented for a total of 56 graphs overall. All of these graphs contain plotlines for each of the eight tree species tracked. This means that overall we need to create 448 distinct time series. I didn’t do any formal timings of how long this takes, but in general it came in somewhere under ten seconds to generate all of these. Just for fun, I added GPars to the mix to parallelize creation of the charts, but since writing the images to disk is going to be the most expensive part of this process, I don’t imagine it’s speeding things up terribly much. First, reading in the JSON data from a file is simple with JsonSlurper. def data new File(jsonFilename).withReader {Reader reader -> data = new JsonSlurper().parse(reader) } assert data Here’s a sample of what the JSON data looks like for one species over a single year, broken down first by one of the four major groups, then by tree species, then by year and finally by Province or Territory. { 'Provincial': [ { 'name': 'Pine', '1990': { 'NL': 583.0, 'NL_notes': '', 'PE': 52.0, 'PE_notes': '', 'NS': 4.0, 'NS_notes': '', 'NB': 4715.0, 'NB_notes': '', 'QC': 33422.0, 'QC_notes': '', 'ON': 51062.0, 'ON_notes': '', 'MB': 2985.0, 'MB_notes': '', 'SK': 4671.0, 'SK_notes': '', 'AB': 8130.0, 'AB_notes': '', 'BC': 89167.0, 'BC_notes': 'e', 'YT': '-', 'YT_notes': '', 'NT *a': 15.0, 'NT *a_notes': '', 'NU': '..', 'NU_notes': '', 'CA': 194806.0, 'CA_notes': 'e' }, ... Building the charts is a simple matter of iterating over the resulting map of parsed data. In this case we’re ignoring the ‘notes’ data but have included it in the dataset in case we want to use it later. We’re also just ignoring any non-numeric values. GROUPINGS.each { group -> withPool { AREAS.eachParallel { area -> ChartBuilder builder = new ChartBuilder(); String title = sanitizeName('$group-$area') TimeseriesChart chart = builder.timeserieschart(title: group, timeAxisLabel: 'Year', valueAxisLabel: 'Number of Seedlings(1000s)', legend: true, tooltips: false, urls: false ) { timeSeriesCollection { data.'$group'.each { species -> Set years = (species.keySet() - 'name').collect {it as int} timeSeries(name: species.name, timePeriodClass: 'org.jfree.data.time.Year') { years.sort().each { year -> final value = species.'$year'.'$area' //check that it's a numeric value if (!(value instanceof String)) { add(period: new Year(year), value: value) } } } } } } ... } Then we apply some additional formatting to the JFreeChart to enhance the output styling, insert an image into the background, and fix the plot color schemes. JFreeChart innerChart = chart.chart String longName = PROVINCE_SHORT_FORM_MAPPINGS.find {it.value == area}.key innerChart.addSubtitle(new TextTitle(longName)) innerChart.setBackgroundPaint(Color.white) innerChart.plot.setBackgroundPaint(Color.lightGray.brighter()) innerChart.plot.setBackgroundImageAlignment(Align.TOP_RIGHT) innerChart.plot.setBackgroundImage(logo) [Color.BLUE, Color.GREEN, Color.ORANGE, Color.CYAN, Color.MAGENTA, Color.BLACK, Color.PINK, Color.RED].eachWithIndex { color, int index -> innerChart.XYPlot.renderer.setSeriesPaint(index, color) }And we write out each of the charts to a formulaically named png file. def fileTitle = '$FILE_PREFIX-${title}.png' File outputDir = new File(outputDirectory) if (!outputDir.exists()) { outputDir.mkdirs() } File file = new File(outputDir, fileTitle) if (file.exists()) { file.delete() } ChartUtilities.saveChartAsPNG(file, innerChart, 550, 300) To tie it all together, an html page is created using MarkupBuilder to showcase all of the results, organized by Province or Territory. def buildHtml(inputDirectory) { File inputDir = new File(inputDirectory) assert inputDir.exists() Writer writer = new StringWriter() MarkupBuilder builder = new MarkupBuilder(writer) builder.html { head { title('Number of Seedlings Planted by Ownership, Species') style(type: 'text/css') { mkp.yield(CSS) } } body { ul { AREAS.each { area -> String areaName = sanitizeName(area) div(class: 'area rounded-corners', id: areaName) { h2(PROVINCE_SHORT_FORM_MAPPINGS.find {it.value == area}.key) inputDir.eachFileMatch(~/.*$areaName\.png/) { img(src: it.name) } } } } script(type: 'text/javascript', src: 'https://ajax.googleapis.com/ajax/libs/jquery/1.7.1/jquery.min.js', '') script(type: 'text/javascript') { mkp.yield(JQUERY_FUNCTION) } } } writer.toString() } The generated html page assumes that all images are co-located in the same folder, presents four images per Province/Territory and, just for fun, uses JQuery to attach a click handler to each of the headers. Click on a header and the images in that div will animate into the background. I’m sure the actual JQuery being used could be improved upon, but it serves its purpose. Here’s a sample of the html output: <ul> <div class='area rounded-corners' id='NL'> <h2>Newfoundland and Labrador</h2> <img src='naturalResourcesCanadaNewSeedlings-Federal-NL.png' /> <img src='naturalResourcesCanadaNewSeedlings-PrivateLand-NL.png' /> <img src='naturalResourcesCanadaNewSeedlings-Provincial-NL.png' /> <img src='naturalResourcesCanadaNewSeedlings-Total-NL.png' /> </div> ... The resulting page looks like this in Firefox.  Source code and Links The source code is available on GitHub. So is the final resulting html page. The entire source required to go from Excel to charts embedded in an html page comes in at slightly under 300 lines of code and I don’t think the results are too bad for the couple of hours effort involved. Finally, the JSON results are also hosted on the GitHub pages for the project for anyone else who might want to delve into the data. Some reading related to this topic:Groovy loves POI and POI loves Groovy Writing batch import scripts with Grails, GSQL and GPars GSheets – A Groovy Builder based on Apache POI Groovy for the OfficeRelated links:Groovy inspect()/Eval for Externalizing Data Groovy reverse map sort done easy Five Cool Things You Can Do With Groovy ScriptsReference: JFreeChart with Groovy and Apache POI from our JCG partner Kelly Robinson at the The Kaptain on … stuff blog....

Spring @Configuration and FactoryBean

Consider a FactoryBean for defining a cache using a Spring configuration file: <cache:annotation-driven /> <context:component-scan base-package='org.bk.samples.cachexml'></context:component-scan> <bean id='cacheManager' class='org.springframework.cache.support.SimpleCacheManager'> <property name='caches'> <set> <ref bean='defaultCache'/> </set> </property> </bean> <bean name='defaultCache' class='org.springframework.cache.concurrent.ConcurrentMapCacheFactoryBean'> <property name='name' value='default'/> </bean> The factory bean ConcurrentMapCacheFactoryBean is a bean which is in turn responsible for creating a Cache bean. My first attempt at translating this setup to a @Configuration style was the following: @Bean public SimpleCacheManager cacheManager(){ SimpleCacheManager cacheManager = new SimpleCacheManager(); List<Cache> caches = new ArrayList<Cache>(); ConcurrentMapCacheFactoryBean cacheFactoryBean = new ConcurrentMapCacheFactoryBean(); cacheFactoryBean.setName('default'); caches.add(cacheFactoryBean.getObject()); cacheManager.setCaches(caches ); return cacheManager; } This did not work however, the reason is that here I have bypassed some Spring bean lifecycle mechanisms altogether. It turns out that ConcurrentMapCacheFactoryBean also implements the InitializingBean interface and does a eager initialization of the cache in the ‘afterPropertiesSet’ method of InitializingBean. Now by directly calling factoryBean.getObject() , I was completely bypassing the afterPropertiesSet method. There are two possible solutions: 1. Define the FactoryBean the same way it is defined in the XML: @Bean public SimpleCacheManager cacheManager(){ SimpleCacheManager cacheManager = new SimpleCacheManager(); List<Cache> caches = new ArrayList<Cache>(); caches.add(cacheBean().getObject()); cacheManager.setCaches(caches ); return cacheManager; }@Bean public ConcurrentMapCacheFactoryBean cacheBean(){ ConcurrentMapCacheFactoryBean cacheFactoryBean = new ConcurrentMapCacheFactoryBean(); cacheFactoryBean.setName('default'); return cacheFactoryBean; }In this case, there is an explicit FactoryBean being returned from a @Bean method, and Spring will take care of calling the lifecycle methods on this bean. 2. Replicate the behavior in the relevant lifecycle methods, in this specific instance I know that the FactoryBean instantiates the ConcurrentMapCache in the afterPropertiesSet method, I can replicate this behavior directly this way: @Bean public SimpleCacheManager cacheManager(){ SimpleCacheManager cacheManager = new SimpleCacheManager(); List<Cache> caches = new ArrayList<Cache>(); caches.add(cacheBean()); cacheManager.setCaches(caches ); return cacheManager; }@Bean public Cache cacheBean(){ Cache cache = new ConcurrentMapCache('default'); return cache; } Something to keep in mind when translating a FactoryBean from xml to @Configuration. Note: A working one page test as a gist is available here: package org.bk.samples.cache; import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.equalTo; import java.util.ArrayList; import java.util.List; import java.util.Random; import org.junit.Test; import org.junit.runner.RunWith; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.cache.Cache; import org.springframework.cache.annotation.Cacheable; import org.springframework.cache.annotation.EnableCaching; import org.springframework.cache.concurrent.ConcurrentMapCacheFactoryBean; import org.springframework.cache.support.SimpleCacheManager; import org.springframework.context.annotation.Bean; import org.springframework.context.annotation.ComponentScan; import org.springframework.context.annotation.Configuration; import org.springframework.stereotype.Component; import org.springframework.test.context.ContextConfiguration; import org.springframework.test.context.junit4.SpringJUnit4ClassRunner; @RunWith(SpringJUnit4ClassRunner.class) @ContextConfiguration(classes={TestSpringCache.TestConfiguration.class}) public class TestSpringCache { @Autowired TestService testService; @Test public void testCache() { String response1 = testService.cachedMethod('param1', 'param2'); String response2 = testService.cachedMethod('param1', 'param2'); assertThat(response2, equalTo(response1)); } @Configuration @EnableCaching @ComponentScan('org.bk.samples.cache') public static class TestConfiguration{ @Bean public SimpleCacheManager cacheManager(){ SimpleCacheManager cacheManager = new SimpleCacheManager(); List<Cache> caches = new ArrayList<Cache>(); caches.add(cacheBean().getObject()); cacheManager.setCaches(caches ); return cacheManager; } @Bean public ConcurrentMapCacheFactoryBean cacheBean(){ ConcurrentMapCacheFactoryBean cacheFactoryBean = new ConcurrentMapCacheFactoryBean(); cacheFactoryBean.setName('default'); return cacheFactoryBean; } } } interface TestService{ String cachedMethod(String param1,String param2); } @Component class TestServiceImpl implements TestService{ @Cacheable(value='default', key='#p0.concat('-').concat(#p1)') public String cachedMethod(String param1, String param2){ return 'response ' + new Random().nextInt(); } }Reference: Spring @Configuration and FactoryBean from our JCG partner Biju Kunjummen at the all and sundry blog....

Fixing Bugs that can’t be Reproduced

There are bugs that can’t be reproduced, or at least not easily: intermittent and transient errors; bugs that disappear when you try to look for them; bugs that occur as the result of a long chain of independent operations or cross-request timing. Some of these bugs are only found in high-scale production systems that have been running for a long time under heavy load. Capers Jones calls these bugs “abeyant defects” and estimates that in big systems, as much as 10% of bugs cannot be reproduced, or are too expensive to try reproducing. These bugs can cost 100x more to fix than a simple defect – an “average” bug like this can take more than a week for somebody to find (if they can be found at all) by walking through the design and code, and up to another week or two to fix. Heisenbugs One class of bugs that can’t be reproduced are Heisenbugs: bugs that disappear when you attempt to trace or isolate them. When you add tracing code, or step through the problem in a debugger, the problem goes away. In Debug It!, Paul Butcher offers some hope for dealing with these bugs. He says Heisenbugs are caused by non-deterministic behavior which in turn can only be caused by:Unpredictable initial state – a common problem in C/C++ code Interaction with external systems – which can be isolated and stubbed out, although this is not always easy Deliberate randomness – random factors can also be stubbed out in testing Concurrency – the most common cause of Heisenbugs today, at least in Java.Knowing this (or at least making yourself believe it while you are trying to find a problem like this) can help you decide where to start looking for a cause, and how to go forward. But unfortunately, it doesn’t mean that you will find the bug, at least not soon. Race Conditions Race conditions used to be problems only for systems programmers and people writing communications handlers. But almost everybody runs into race conditions today – is anybody writing code that isn’t multi-threaded anymore? Races are synchronization errors that occur when two or more threads or processes access the same data or resource without a consistent locking approach. Races can result in corrupted data – memory getting stomped on (especially in C/C++), changes applied more than once (balances going negative) or changes being lost (credits without debits) – inconsistent UI behaviour, random crashes due to null pointer problems (a thread references an object that has already been freed by another thread), intermittent timeouts, and in actions executed out of sequence, including time-of-check time-of-use security violations. The results will depend on which thread or process wins the race a particular time. Because races are the result of unlucky timing, and because they problem may not be visible right away (e.g., something gets stepped on but you don’t know until much later when something else tries to use it), they’re usually hard to understand and hard to make happen. You can’t fix (or probably even find) a race condition without understanding concurrency. And the fact that you have a race condition is a good sign that whoever wrote this code didn’t understand concurrency, so you’re probably not dealing with only one mistake. You’ll have to be careful in diagnosing and especially in fixing the bug, to make sure that you don’t change the problem from a race into a stall, thread starvation or livelock, or deadlock instead, by getting the synchronization approach wrong. Fixing bugs that can’t be reproduced If you’re think you’re dealing with a concurrency problem, a race condition or a timing-related bug, try introducing log pauses between different threads – this will expand the window for races and timing-related problems to occur, which should make the problem more obvious. This is what IBM Research’s ConTest tool does (or did – unfortunately, ConTest seems to have disappeared off of the IBM alphaWorks site), messing with thread scheduling to make deadlocks and races occur more often. If you can’t reproduce the bug, that doesn’t mean that you give up. There are still some things to look at and try. It’s often faster to find concurrency bugs, timing problems and other hard problems by working back from the error and stepping through the code – in a debugger or by hand – to build up your own model of how the code is supposed to work. “Even if you can’t easily reproduce the bug in the lab, use the debugger to understand the affected code. Judiciously stepping into or over functions based on your level of “need to know” about that code path. Examine live data and stack traces to augment your knowledge of the code paths.” Jeff Vroom, Debugging Hard Problems I’ve worked with brilliant programmers who can work back through even the nastiest code, tracing through what’s going on until they see the error. For the rest of us, debugging is a perfect time to pair up. While I’m not sure that pair programming makes a lot of sense as an everyday practice for everyday work, I haven’t seen too many ugly bugs solved without two smart people stepping through the code and logs together. It’s also important to check compiler and static analysis warnings – these tools can help point to easily-overlooked coding mistakes that could be the cause of your bug, and maybe even other bugs. Findbugs has statistical concurrency bug pattern checkers that can point out common concurrency bug patterns as well as lots of other coding mistakes that you can miss finding on your own. There are also dynamic analysis tools that are supposed to help find race conditions and other kinds of concurrency bugs at run-time, but I haven’t seen any of them actually work. If anybody has had success with tools like this in real-world applications I would like to hear about it. Shotgun Debugging, Offensive Programming, Brute-Force Debugging and other ideas Debug It! recommends that if you are desperate, just take a copy of the code, and try changing something, anything, and see what happens. Sometimes the new information that results from your change may point you in a new direction. This kind of undirected “Shotgun Debugging” is basically wishful thinking, relying on luck and accidental success, what Andy Hunt and Dave Thomas call “Programming by Coincidence”. It isn’t something that you want to rely on, or be proud of. But there are some changes that do make a lot of sense to try. In Code Complete, Steve McConnell recommends making “offensive coding” changes: adding asserts and other run-time debugging checks that will cause the code to fail if something “impossible” happens, because something “impossible” apparently is happening. Jeff Vroom suggests writing your own debugging code: “For certain types of complex code, I will write debugging code, which I put in temporarily just to isolate a specific code path where a simple breakpoint won’t do. I’ve found using the debugger’s conditional breakpoints is usually too slow when the code path you are testing is complicated. You may hit a specific method 1000?s of times before the one that causes the failure. The only way to stop in the right iteration is to add specific code to test for values of input parameters… Once you stop at the interesting point, you can examine all of the relevant state and use that to understand more about the program.” Paul Butcher suggests that before you give up trying to reproduce and fix a problem, see if there are any other bugs reported in the same area and try to fix them – even if they aren’t serious bugs. The rationale: fixing other bugs may clear up the situation (your bug was being masked by another one); and by working on something else in the same area, you may learn more about the code and get some new ideas about your original problem. Refactoring or even quickly rewriting some of the code where you think the problem might be can sometimes help you to see the problem more clearly, especially if the code is difficult to follow. Or it could possibly move the problem somewhere else, making it easier to find. If you can’t find it and fix it, at least add defensive code: tighten up error handling and input checking, and add some logging – if something is happening that can’t happen or won’t happen for you, you need more information on where to look when it happens again. Question Everything Finally, The Pragmatic Programmer tells us to remember that nothing is impossible. If you aren’t getting closer to the answer, question your assumptions. Code that has always worked might not work in this particular case. Unless you have a complete trace of the problem in the original report, it’s possible that the report is incomplete or misleading – that the problem you need to solve is not the problem that you are looking at. If you’re trapped in a real debugging nightmare, look outside of your own code. The bug may not be in your code, but in underlying third party code in your service stack or the OS. In big systems under heavy load, I’ve run into problems in the operating system kernel, web servers, messaging middleware, virtual machines, and the DBMS. Your job when debugging a problem like this is to make sure that you know where the bug isn’t (in your code), and try to come up with a simple a test case as possible that shows this – when you’re working with a vendor, they’re not going to be able to setup and run a full-scale enterprise app in order to reproduce your problem. Hard bugs that can’t be reproduced can blow schedules and service levels to hell, and wear out even your best people. Taking all of this into account, let’s get back to the original problem of how or whether to estimate bug fixes, in the next post. Reference: Fixing Bugs that can’t be Reproduced from our JCG partner Jim Bird at the Building Real Software blog....

Enterprise SOAP Services in a Web 2.0 REST/JSON World

With the popularlity of JSON and other NoSQL data standard formats, the complexity and in some cases the plain verbosity of XML formats are being shunned. However, XML and the abilities and standards that have formed around it have become key to the enterprise and their business processes. However, the needs of their customers require that they start to support multiple formats. To this end, tooling frameworks like Axis2 have started to add support for taking WSDL based services and generate JSON responses.Enterprises need to live in this post SOAP world, but leverage the expertise and SOA infrastructures they have developed over the years. Axis2 is one way to do it, but it doesn’t provide monitoring and policy support of the box. Another alternative is the Turmeric SOA project from ebay. Natively out of the box one can take a WSDL like, the one provided by the STAR standard organization, and add support not only from SOAP 1.1/1.2, but also for REST style services serving XML, JSON, and other any other data format you would need to support.There is a catch though, Turmeric SOA was not designed with full SOAP and the W3C web service stack in mind. It uses WSDL to only describe the operations and the data formats supported by the service. So advanced features like WS-Security, WS-Reliable Messaging, and XML Encryption are not natively built into Turmeric. Depending on your needs you will need to work with pipeline Handlers and enhance the protocol processors to support some of the advance features. However, these are items that can be worked around, and it can interoperate with existing web services to act as a proxy.As an example, the STAR organisation provides a web service specification that has been implemented in the automative industry to provide transports for their specifications. Using a framework like Turmeric SOA existing applications can be made available to trading partners and consumer of the service in multiple formats. As an example, one could provide data in RESTful xml: <?xml version='1.0' encoding='UTF-8'?> <ns2:PullMessageResponse xmlns:ms="http://www.ebayopensource.org/turmeric/common/v1/types" xmlns:ns2="http://www.starstandards.org/webservices/2009/transport"> <ns2:payload> <ns2:content id="1"/> </ns2:payload> </ns2:PullMessageResponse>Or one can provide the same XML represented in a JSON format: { "jsonns.ns2": "http://www.starstandards.org/webservices/2009/transport", "jsonns.ns3": "http://www.starstandards.org/webservices/2009/transport/bindings", "jsonns.ms": "http://www.ebayopensource.org/turmeric/common/v1/types", "jsonns.xs": "http://www.w3.org/2001/XMLSchema", "jsonns.xsi": "http://www.w3.org/2001/XMLSchema-instance", "ns2.PullMessageResponse": { "ns2.payload": { "ns2.content": [ { "@id": "1" } ] } } }The above is generated from the same web service, but with just a header changed to indicate the data format that should be returned. No actual changes to business logic or the web service implementation code itself has to change. In Turmeric this is handled with the Data Binding Framework and its corresponding pipeline handlers. With Axis2 this is a message request configuration entry. Regardless of how it is done, it is important to be able to leverage existing services, but provide the data in a format that your consumers require.For those that are interested, I’ve created a sample STAR Web Service that can be used with Turmeric SOA to see how this works. Code is available at github.While Turmeric handles the basics of the SOAP protocol, advance items like using and storing information in the soap:header are not as easily supported. You can get to the information, but because the use of WSDL in Turmeric Services are there just to describe the data formats and messages, the underlying soap transport support is not necessarily leveraged to the full specification. Depending on your requirements, Axis2 may be better, but Turmeric SOA provides additional items like Service Metrics, Monitoring Console, Policy Administration, Rate Limiting through XACML 2.0 based policies, and much more. If you already have existing web services written the W3C way, but need to provide data in other formats, Turmeric can be used along side. It isn’t a one or the other proposition. Leverage the tools that provide you the greatest flexibility to provide the data to your consumers, with the least amount of effort.Reference: Enterprise SOAP Services in a Web 2.0 REST/JSON World from our JCG partner David Carver at the Intellectual Cramps blog....

Using EasyMock or Mockito

I have been using EasyMock for most of time but recently I worked with a few people who were pretty much inclined to use Mockito. Not intending to use two frameworks for the same purpose in the same project I adopted Mockito. So for the last couple of months I have been using Mockito and here is my comparative analysis of the two.The people with whom I have work cite reasons of test readability for using Mockitio but I have a different opinion on the same. Suppose we have the following code that we intend to test : public class MyApp { MyService service; OtherService otherService;void operationOne() { service.operationOne(); }void operationTwo(String args) { String operationTwo = otherService.operationTwo(args); otherService.operationThree(operationTwo); }void operationThree() { service.operationOne(); otherService.operationThree("success"); } }class MyService { void operationOne() {} }class OtherService { public String operationTwo(String args) { return args; }public void operationThree(String operationTwo) {} } Now let me write a simple test case for this class using EasyMock and the using Mockito. public class MyAppEasyMockTest { MyApp app; MyService service; OtherService otherService;@Before public void initialize() { service = EasyMock.createMock(MyService.class); otherService = EasyMock.createMock(OtherService.class); app = new MyApp(); app.service = service; app.otherService = otherService; }@Test public void verifySimpleCall() { service.operationOne(); EasyMock.replay(service); app.operationOne(); EasyMock.verify(service); } } public class MyAppMockitoTest { MyApp app; MyService service; OtherService otherService;@Before public void initialize() { service = Mockito.mock(MyService.class); otherService = Mockito.mock(OtherService.class); app = new MyApp(); app.service = service; app.otherService = otherService; }@Test public void verifySimpleCall() { app.operationOne(); Mockito.verify(service).operationOne(); } } This is a really simple test and I must say the Mockito one is more readable . But according to the classic testing methodology the Mockito test is not complete. We have verified the call that we are looking for but if tomorrow I change the source code by adding one more call to service the test would not break. void operationOne() { service.operationOne(); service.someOtherOp(); } Now this makes me feel that the tests are not good enough. But thankfully Mockito gives the verifyNoMoreInteractions that can be used to complete the test. Now let me write a few more test for the MyApp class. public class MyAppEasyMockTest { @Test public void verifyMultipleCalls() { String args = "one"; EasyMock.expect(otherService.operationTwo(args)).andReturn(args); otherService.operationThree(args); EasyMock.replay(otherService); app.operationTwo(args); EasyMock.verify(otherService); }@Test(expected = RuntimeException.class) public void verifyException() { service.operationOne(); EasyMock.expectLastCall().andThrow(new RuntimeException()); EasyMock.replay(service); app.operationOne(); }@Test public void captureArguments() { Capture<String> captured = new Capture<String>(); service.operationOne(); otherService.operationThree(EasyMock.capture(captured)); EasyMock.replay(service, otherService); app.operationThree(); EasyMock.verify(service, otherService); assertTrue(captured.getValue().contains("success")); }}public class MyAppMockitoTest { @Test public void verifyMultipleCalls() { String args = "one"; Mockito.when(otherService.operationTwo(args)).thenReturn(args); app.operationTwo(args); Mockito.verify(otherService).operationTwo(args); Mockito.verify(otherService).operationThree(args); Mockito.verifyNoMoreInteractions(otherService); Mockito.verifyZeroInteractions(service); }@Test(expected = RuntimeException.class) public void verifyException() { Mockito.doThrow(new RuntimeException()).when(service).operationOne(); app.operationOne(); }@Test public void captureArguments() { app.operationThree(); ArgumentCaptor capturedArgs = ArgumentCaptor .forClass(String.class); Mockito.verify(service).operationOne(); Mockito.verify(otherService).operationThree(capturedArgs.capture()); assertTrue(capturedArgs.getValue().contains("success")); Mockito.verifyNoMoreInteractions(service, otherService); } } These are some practical scenarios of testing where we would like to assert arguments, Exceptions etc. If I look and compare the ones written using EasyMock with the ones using Mockito I tend to feel that both the tests are equal in readability, none of them do a better task.The large number of expect and return calls in EasyMock make the tests not readable and the verify statements of Mockito often compromise over test readility. As per the documentation of Mockito the verifyZeroInteractions, verifyNoMoreInteractions should not be used in every test that you write but if I leave them out of my tests then my tests are not good enough.Moreover in tests everything thing should be under the control of the developer i.e. how the interaction are happening and what interactions are happening. In EasyMock this aspect is more visible as the devloper must put down all of these interaction in his code but in Mockito, the framework takes care of all interactions and the developer is just concerned with their verification ( if any). But this can lead to testing scenarios where the developer is not under control of all interactions.There are few nice things that Mockito has like the JunitRunner that can be used to create Mocks of all the required dependencies. It is a nice way of removing some of the infrastructure code and EasyMock should also have one. @RunWith(MockitoJUnitRunner.class) public class MyAppMockitoTest { MyApp app; @Mock MyService service; @Mock OtherService otherService;@Before public void initialize() { app = new MyApp(); app.service = service; app.otherService = otherService; } } Conclusion:Since I have used both frameworks, I feel that except for simple test cases both EasyMock and Mockito lead to test cases that equal in readability. But EasyMock is better for the unit testing as it forces the developer to take control of things. Mockito due to its assumptions and considerations hides this control under the carpet and thus is not a good choice. But Mockito offers certaing things that are quite useful(eg. junitRunner, call chaining) and EasyMock should have one in its next release. Reference: using EasyMock or Mockito from our JCG partner Rahul Sharma at the The road so far… blog blog. ...

ADF : Dynamic View Object

Today I want to write about dynamic view object which allow me to change its data source (SQL query) and attributes at run time. I will use oracle.jbo.ApplicationModule :: createViewObjectFromQueryStmt method to do this issue. I will present how to do this step by step Create View Object and Application Module   1- Right click on Model project and choose New2- Choose from left pane “ADF Business Component” , then from list choose “View Object” and click “OK” button3- Enter “DynamicVO” in “Name” and choose “Sql Query” radio button and click “Next” button.4- Write in Select field “select * from dual” and click “Next” button until reach Window “Step 8 of 9″  5- Check “Add to Application Module” check box and click “Finish” button.Implement Changes in Application Module 1- Open application module “AppModule”, then open Java tab and check “Generate Application Module Class AppModuleImpl” check box2- Open AppModuleImpl.java Class and Add the below method for dynamic view object public void changeDynamicVoQuery(String sqlStatement) { ViewObject dynamicVO = this.findViewObject("DynamicVO1"); dynamicVO.remove(); dynamicVO = this.createViewObjectFromQueryStmt("DynamicVO1", sqlStatement); dynamicVO.executeQuery(); }3- Open “AppModule” then open Java tab and Add changeDynamicVoQuery method to Client InterfaceTest Business Component   1- Right click on AppModue in Application navigator and choose Run from drop down list.2- Right click on AppModule in left pane and choose Show from drop down lsit Write “Select * from Emp” in sqlStatement parameter Click on Execute button, The result will be Success .3- Click double click on DynamicVO1 in left pane, it will display the data of DynamicVO and display data which I entered “Select * from Emp” before not “Select * from dual” that was used in design time of view object.  To use dynamic view objects in ADF Faces, you should use ADF Dynamic Table or ADF Dynamic Form. You can download sample application from here Reference: ADF : Dynamic View Object from our JCG partner Mahmoud A. ElSayed at the Dive in Oracle blog....

Bug Fixing – to Estimate, or not to Estimate, that is the question

According to Steve McConnell in Code Complete (data from 1975-1992) most bugs don’t take long to fix. About 85% of errors can be fixed in less than a few hours. Some more can be fixed in a few hours to a few days. But the rest take longer, sometimes much longer – as I talked about in an earlier post. Given all of these factors and uncertainty, how to you estimate a bug fix? Or should you bother? Block out some time for bug fixing Some teams don’t estimate bug fixes upfront. Instead they allocate a block of time, some kind of buffer for bug fixing as a regular part of the team’s work, especially if they are working in time boxes. Developers come back with an estimate only if it looks like the fix will require a substantial change – after they’ve dug into the code and found out that the fix isn’t going to be easy, that it may require a redesign or require changes to complex or critical code that needs careful review and testing. Use a rule of thumb placeholder for each bug fix Another approach is to use a rough rule of thumb, a standard place holder for every bug fix. Estimate ½ day of development work for each bug, for example. According to this post on Stack Overflow the ½ day suggestion comes from Jeff Sutherland, one of the inventors of Scrum. This place holder should work for most bugs. If it takes a developer more than ½ day to come up with a fix, then they probably need help and people need to know anyways. Pick a place holder and use it for a while. If it seems too small or too big, change it. Iterate. You will always have bugs to fix. You might get better at fixing them over time, or they might get harder to find and fix once you’ve got past the obvious ones. Or you could use the data earlier from Capers Jones on how long it takes to fix a bug by the type of bug. A day or half day works well on average, especially since most bugs are coding bugs (on average 3 hours) or data bugs (6.5 hours). Even design bugs on average only take little more than a day to resolve. Collect some data – and use it Steve McConnell, In Software Estimation: Demystifying the Black Art says that it’s always better to use data than to guess. He suggests collecting time data for as little as a few weeks or maybe a couple of months on how long on average it takes to fix a bug, and use this as a guide for estimating bug fixes going forward. If you have enough defect data, you can be smarter about how to use it. If you are tracking bugs in a bug database like Jira, and if programmers are tracking how much time they spend on fixing each bug for billing or time accounting purposes (which you can also do in Jira), then you can mine the bug database for similar bugs and see how long they took to fix – and maybe get some ideas on how to fix the bug that you are working on by reviewing what other people did on other bugs before you. You can group different bugs into buckets (by size – small, medium, large, x-large – or type) and then come up with an average estimate, and maybe even a best case, worst case and most likely for each type. Use Benchmarks For a maintenance team (a sustaining engineering or break/fix team responsible for software repairs only), you could use industry productivity benchmarks to project how many bugs your team can handle. Capers Jones in Estimating Software Costs says that the average programmer (in the US, in 2009), can fix 8-10 bugs per month (of course, if you’re an above-average programmer working in Canada in 2012, you’ll have to set these numbers much higher). Inexperienced programmers can be expected to fix 6 a month, while experienced developers using good tools can fix up to 20 per month. If you’re focusing on fixing security vulnerabilities reported by a pen tester or a scan, check out the remediation statistical data that Denim Group has started to collect, to get an idea on how long it might take to fix a SQL injection bug or an XSS vulnerability. So, do you estimate bug fixes, or not? Because you can’t estimate how long it will take to fix a bug until you’ve figured out what’s wrong, and most of the work in fixing a bug involves figuring out what’s wrong, it doesn’t make sense to try to do an in-depth estimate of how long it will take to fix each bug as they come up. Using simple historical data, a benchmark, or even a rough guess place holder as a rule-of-thumb all seem to work just as well. Whatever you do, do it in the simplest and most efficient way possible, don’t waste time trying to get it perfect – and realize that you won’t always be able to depend on it. Remember the 10x rule – some outlier bugs can take up to 10x as long to find and fix than an average bug. And some bugs can’t be found or fixed at all – or at least not with the information that you have today. When you’re wrong (and sometimes you’re going to be wrong), you can be really wrong, and even careful estimating isn’t going to help. So stick with a simple, efficient approach, and be prepared when you hit a hard problem, because it’s gonna happen. Reference: Bug Fixing – to Estimate, or not to Estimate, that is the question from our JCG partner Jim Bird at the Building Real Software blog....

Mahout and Scalding for poker collusion detection

When I’ve been reading a very bright book on Mahout, Mahout In Action (which is a great hands-in intro to machine learning, as well), one of the examples has caught my attention. Authors of the book where using well-known K-means clusterization algorithm for finding similar players on stackoverflow.com, where the criterion of similarity was the set of the authors of questions/answers the users were up-/downvoting. In a very simple words, K-means algorithm iteratively finds clusters of points/vectors, located close to each other, in a multidimensional space. Being applied to the problem of finding similars players in StackOverflow, we assume that every axis in the multi-dimensional space is a user, where the distance from zero is a sum of points, awarded to the questions/answers given by other players (those dimensions are also often called “features”, where the the distance is a “feature weight”). Obviously, the same approach can be applied to one of the most sophisticated problems in a massively-multiplayer online poker – collusion detection. We’re making a simple assumption that if two or more players have played too much games with each other (taking into account that any of the players could simply have been an active player, who played a lot of games with anyone), they might be in a collusion. We break a massive set of players into a small, tight clusters (preferably, with 2-8 players in each), using K-means clustering algorithm. In a basic implementation that we will go through further, every user is represented as a vector, where axises are other players that she has played with (and the weight of the feature is the number of games, played together). Stage 1. Building a dictionary As the first step, we need to build a dictionary/enumeration of all the players, involved in the subset of hand history that we analyze: // extract user ID from hand history record val userId = (playerHistory: PlayerHandHistory) => new Text(playerHistory.getUserId.toString)// Builds basic dixtionary (enumeration, in fact) of all the players, participated in the selected subset of hand // history records class Builder(args: Args) extends Job(args) {// input tap is an HTable with hand history entries: hand history id -> hand history record, serialized with ProtoBuf val input = new HBaseSource("hand", args("hbasehost"), 'handId, Array("d"), Array('blob)) // output tap - plain text file with player IDs val output = TextLine(args("output"))input .read .flatMap('blob -> 'player) { // every hand history record contains the list of players, participated in the hand blob: Array[Byte] => // at the first stage, we simply extract the list of IDs, and add it to the flat list HandHistory.parseFrom(blob).getPlayerList.map(userId) } .unique('player) // remove duplicate user IDs .project('player) // leave only 'player column from the tuple .write(output)} 1003 1004 1005 1006 1007 ...Stage 2. Adding indices to the dictionary Secondly, we map user IDs to position/index of a player in the vector. class Indexer(args: Args) extends Job(args) {val output = WritableSequenceFile(args("output"), classOf[Text], classOf[IntWritable], 'userId -> 'idx)TextLine(args("input")).read .map(('offset -> 'line) -> ('userId -> 'idx)) { // dictionary lines are read with indices from TextLine source // out of the box. For some reason, in my case, indices were multiplied by 5, so I have had to divide them tuple: (Int, String) => (new Text(tuple._2.toString) -> new IntWritable((tuple._1 / 5))) } .project(('userId -> 'idx)) // only userId -> index tuple is passed to the output .write(output)} 1003 0 1004 1 1005 2 1006 3 1007 4 ... Stage 3. Building vectors We build vectors that will be passed as an input to K-means clustering algorithm. As we noted above, every position in the vector corresponds to another player the player has played with: /** * K-means clustering algorithm requires the input to be represented as vectors. * In out case, the vector, itself, represents the player, where other users, the player has played with, are * vector axises/features (the weigh of the feature is a number of games, played together) * User: remeniuk */ class VectorBuilder(args: Args) extends Job(args) {import Dictionary._// initializes dictionary pipe val dictionary = TextLine(args("dictionary")) .read .map(('offset -> 'line) -> ('userId -> 'dictionaryIdx)) { tuple: (Int, String) => (tuple._2 -> tuple._1 / 5) } .project(('userId -> 'dictionaryIdx))val input = new HBaseSource("hand", args("hbasehost"), 'handId, Array("d"), Array('blob)) val output = WritableSequenceFile(args("output"), classOf[Text], classOf[VectorWritable], 'player1Id -> 'vector)input .read .flatMap('blob -> ('player1Id -> 'player2Id)) { //builds a flat list of pairs of users that player together blob: Array[Byte] => val playerList = HandsHistoryCoreInternalDomain.HandHistory.parseFrom(blob).getPlayerList.map(userId) playerList.flatMap { playerId => playerList.filterNot(_ == playerId).map(otherPlayerId => (playerId -> otherPlayerId.toString)) } } .joinWithSmaller('player2Id -> 'userId, dictionary) // joins the list of pairs of //user that played together with // the dictionary, so that the second member of the tuple (ID of the second //player) is replaced with th index //in the dictionary .groupBy('player1Id -> 'dictionaryIdx) { group => group.size // groups pairs of players, played together, counting the number of hands } .map(('player1Id, 'dictionaryIdx, 'size) ->('playerId, 'partialVector)) { tuple: (String, Int, Int) => val partialVector = new NamedVector( new SequentialAccessSparseVector(args("dictionarySize").toInt), tuple._1) // turns a tuple of two users // into a vector with one feature partialVector.set(tuple._2, tuple._3) (new Text(tuple._1), new VectorWritable(partialVector)) } .groupBy('player1Id) { // combines partial vectors into one vector that represents the number of hands, //played with other players group => group.reduce('partialVector -> 'vector) { (left: VectorWritable, right: VectorWritable) => new VectorWritable(left.get.plus(right.get)) } } .write(output)} 1003 {3:5.0,5:4.0,6:4.0,9:4.0} 1004 {2:4.0,4:4.0,8:4.0,37:4.0} 1005 {1:4.0,4:5.0,8:4.0,37:4.0} 1006 {0:5.0,5:4.0,6:4.0,9:4.0} 1007 {1:4.0,2:5.0,8:4.0,37:4.0} ...The entire workflow, required to vectorize the input: val conf = new Configuration conf.set("io.serializations", "org.apache.hadoop.io.serializer.JavaSerialization," + "org.apache.hadoop.io.serializer.WritableSerialization")// the path, where the vectors will be stored to val vectorsPath = new Path("job/vectors") // enumeration of all users involved in a selected subset of hand history records val dictionaryPath = new Path("job/dictionary") // text file with the dictionary size val dictionarySizePath = new Path("job/dictionary-size") // indexed dictionary (every user ID in the dictionary is mapped to an index, from 0) val indexedDictionaryPath = new Path("job/indexed-dictionary")println("Building dictionary...") // extracts IDs of all the users, participating in selected subset of hand history records Tool.main(Array(classOf[Dictionary.Builder].getName, "--hdfs", "--hbasehost", "localhost", "--output", dictionaryPath.toString)) // adds index to the dictionary Tool.main(Array(classOf[Dictionary.Indexer].getName, "--hdfs", "--input", dictionaryPath.toString, "--output", indexedDictionaryPath.toString)) // calculates dictionary size, and stores it to the FS Tool.main(Array(classOf[Dictionary.Size].getName, "--hdfs", "--input", dictionaryPath.toString, "--output", dictionarySizePath.toString))// reads dictionary size val fs = FileSystem.get(dictionaryPath.toUri, conf) val dictionarySize = new BufferedReader( new InputStreamReader( fs.open(new Path(dictionarySizePath, "part-00000")) )).readLine().toIntprintln("Vectorizing...") // builds vectors (player -> other players in the game) // IDs of other players (in the vectors) are replaces with indices, taken from dictionary Tool.main(Array(classOf[VectorBuilder].getName, "--hdfs", "--dictionary", dictionaryPath.toString, "--hbasehost", "localhost", "--output", vectorsPath.toString, "--dictionarySize", dictionarySize.toString))Stage 4. Generating n-random clusters Random clusters/centroids is an entry point for K-means algorithm: //randomly selected cluster the will be passed as an input to K-means val inputClustersPath = new Path('jobinput-clusters') val distanceMeasure = new EuclideanDistanceMeasure println('Making random seeds...') //build 30 initial random clusterscentroids RandomSeedGenerator.buildRandom(conf, vectorsPath, inputClustersPath, 30, distanceMeasure)Stage 5. Running K-means algorithms Every next iteration, K-means will find better centroids and clusters. As a result, we have 30 clusters of players that played with each other the most often: // clusterization results val outputClustersPath = new Path("job/output-clusters") // textual dump of clusterization results val dumpPath = "job/dump"println("Running K-means...") // runs K-means algorithm with up to 20 iterations, to find clusters of colluding players (assumption of collusion is // made on the basis of number hand player together with other player[s]) KMeansDriver.run(conf, vectorsPath, inputClustersPath, outputClustersPath, new CosineDistanceMeasure(), 0.01, 20, true, 0, false)println("Printing results...")// dumps clusters to a text file val clusterizationResult = finalClusterPath(conf, outputClustersPath, 20) val clusteredPoints = new Path(outputClustersPath, "clusteredPoints") val clusterDumper = new ClusterDumper(clusterizationResult, clusteredPoints) clusterDumper.setNumTopFeatures(10) clusterDumper.setOutputFile(dumpPath) clusterDumper.setTermDictionary(new Path(indexedDictionaryPath, "part-00000").toString, "sequencefile") clusterDumper.printClusters(null)Results Let’s go to “job/dump”, now – this file contains textual dumps of all clusters, generated by K-means. Here’s a small fragment of the file: VL-0{n=5 c=[1003:3.400, 1006:3.400, 1008:3.200, 1009:3.200, 1012:3.200] r=[1003:1.744, 1006:1.744, 1008:1.600, 1009:1.600, 1012:1.600]} Top Terms: 1006 => 3.4 1003 => 3.4 1012 => 3.2 1009 => 3.2 1008 => 3.2 VL-15{n=1 c=[1016:4.000, 1019:3.000, 1020:3.000, 1021:3.000, 1022:3.000, 1023:3.000, 1024:3.000, 1025:3.000] r=[]} Top Terms: 1016 => 4.0 1025 => 3.0 1024 => 3.0 1023 => 3.0 1022 => 3.0 1021 => 3.0 1020 => 3.0 1019 => 3.0 As we can see, 2 clusters of players have been detected: one with 8 players, that has played a lot of games with each other, and the second with 4 players. Reference: Poker collusion detection with Mahout and Scalding from our JCG partner Vasil Remeniuk at the Vasil Remeniuk blog blog....

Hibernate caches basics

Recently I have experimented with hibernate cache. In this post I would like share my experience and point out some of the details of Hibernate Second Level Cache. On the way I will direct you to some articles that helped me implement the cache. Let’s get started from the ground. Caching in hibernate Caching functionality is designed to reduces the amount of necessary database access. When the objects are cached they resides in memory. You have the flexibility to limit the usage of memory and store the items in disk storage.The implementation will depend on the underlying cache manager. There are various flavors of caching available, but is better to cache non-transactional and read-only data. Hibernate provides 3 types of caching. 1. Session Cache The session cache caches object within the current session. It is enabled by default in Hibernate. Read more about Session Cache . Objects in the session cache resides in the same memory location. 2. Second Level Cache The second level cache is responsible for caching objects across sessions. When this is turned on, objects will be first searched in cache and if they are not found, a database query will be fired. Read here on how to implement Second Level Cache. Second level cache will be used when the objects are loaded using their primary key. This includes fetching of associations. In case of second level cache the objects are constructed and hence all of them will reside in different memory locations. 3. Query Cache Query Cache is used to cache the results of a query. Read here on how to implement query cache.When the query cache is turned on, the results of the query are stored against the combination query and parameters. Every time the query is fired the cache manager checks for the combination of parameters and query. If the results are found in the cache they are returned other wise a database transaction is initiated. As you can see, it is not a good idea to cache a query if it has number of parameters or a single parameter can take number of values. For each of this combination the results are stored in the memory. This can lead to extensive memory usage. Finally, here is a list of good articles written on this topic, 1. Speed Up Your Hibernate Applications with Second-Level Caching 2. Hibernate: Truly Understanding the Second-Level and Query Caches 3. EhCache Integration with Spring and Hibernate. Step by Step Tutorial 4. Configuring Ehcache with hibernate Reference: All about Hibernate Second Level Cache from our JCG partner Manu PK at the The Object Oriented Life blog....
Java Code Geeks and all content copyright © 2010-2014, Exelixis Media Ltd | Terms of Use | Privacy Policy | Contact
All trademarks and registered trademarks appearing on Java Code Geeks are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries.
Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
Do you want to know how to develop your skillset and become a ...
Java Rockstar?

Subscribe to our newsletter to start Rocking right now!

To get you started we give you two of our best selling eBooks for FREE!

Get ready to Rock!
You can download the complementary eBooks using the links below: