As companies become more interested in exposing web services to get data to mobile apps, the question always comes up – “What format should we use for the services?”. It seems like JSON is the default answer anymore – but enterprise companies often have existing investments in XML-based web services, or they’re using some enterprise web services stack, middleware, or ESB that doesn’t have great REST/JSON support. JSON is clearly the lighter format, but I don’t think it should be the default answer without some analysis. SOAP/XML and REST/JSON each have their own advantages and disadvantages, and work in different ways across various mobile platforms with regards to speed of application development, message size, and performance.
By “performance”, I mean the speed of serializing and deserializing the formats – after some quick serialization tests pitting JSON.NET against the WCF DataContractSerializer (the default .NET SOAP serializer) – it appeared that the DataContractSerializer was about 49% faster when serializing 50,000 normal objects. The average JSON.NET serialization speed was 0.976 seconds, and the average DataContractSerializer serialization speed was 0.48 seconds.
However, when deserializing the same objects the results are reversed. For deserialization, it appeared that JSON.NET was about 58% faster than DataContractSerializer. The average JSON.NET deserialization speed was 1.27 seconds and the average DataContractSerializer deserialization speed was 2.16 seconds.
It would appear that in standard .NET Framework serialization, XML/SOAP wins for serialization and JSON wins for deserialization. It seems that there isn’t a massive difference either way if you’re working with reasonable message sizes (i.e. not 50k objects), and serialization performance should not weight heavily on the decision on which format to use unless there there are metrics specific to the chosen platform / framework / libraries.
The results could easily be different on other platforms or with other serializers. Anecdotally, in the .NET Compact Framework on Windows Mobile – JSON deserialization was thousands of times faster than XML deserialization. I had an instance where deserializing around 7000 XML objects took 5+ minutes (due to poor reflection performance) and deserializing the same objects with JSON took only seconds.
The size of the messages are important for reducing the amount of time it takes to send and receive data in your mobile app. Given the verbosity of the format, SOAP-based XML messages are much larger (after encoding) format when compared to JSON-encoded messages. However, after gzip compression (commonly available and used on most mobile web browsers and web servers), the size of the messages are usually comparable.
After doing some serialization tests with 500 objects, again using JSON.NET and the WCF DataContractSerializer – the uncompressed size of the SOAP message was 257 kb and the JSON message size was 82 kb – about three times smaller. However, after gzip compression the size of the SOAP message was 12.3 kb and the JSON message was 12 kb – almost no difference in size. Further, with current cellular data connections (i.e. 3G/4G) – even a large message (300kb+) can be received in less than a second on a mobile device with a good connection.
It would appear that JSON only has a slight edge on message size if gzip is being used, and a major edge if gzip is not being used. If the absolute best possible performance is needed for a mobile app, then I’d go with JSON – otherwise, one of the few other differentiators between the two formats would be the ease of use across platforms..
One thing that SOAP/XML is actually good at is defining explicit contracts. This allows the service and the consumer to agree beforehand on what type of messages will be sent and received, and most service consumers will be able to automatically generate code to create service client proxies. For example, in Visual Studio (Right click > Add Service Reference), MonoDevelop, or Eclipse – service clients can be easily created and updated via code generation if SOAP/XML is being used for the service. This allows for more rapid application development as it’s easier to maintain service references as the service evolves during the development cycle. Mobile development platforms with this capability include MonoTouch (iOS/C#), Mono for Android (Android/C#), and Windows Phone 7, i.e. the .NET-based mobile platforms. REST and JSON aren’t impossible to work with on the .NET-based platforms, but I prefer to leave the client-side web service plumbing to code generation if possible.
It is usually more difficult (and there are fewer libraries available) to consume SOAP/XML based services on iOS (using Objective-C), Android (using Java), and mobile web development platforms (i.e. using jQuery mobile, SenchaTouch, etc.) – there are limited options available for automatically creating SOAP/XML service clients on these platforms (half-baked, buggy, unsupported, or not-updated-in-3-years efforts aside). I’d never want to maintain a SOAP/XML client “manually”, or write the client by hand. It’s usually pretty easy to consume SOAP services with Java – as it’s just as much a standard in Java as it is in .NET, but for some reason it’s a pain in Android. REST and JSON are generally just much easier to work with on the aforementioned platforms given the currently available tools.
In the end, the decision comes down to a few things: performance and message size (with negligible impact either way), and ease of use on the mobile platform of choice… in summary:
REST/JSON
SOAP/XML
As mobility has exploded onto the list of of top IT priorities over the past few years, many IT shops are realizing that one of the first steps in creating mobile apps that make use of data is figuring out how to properly expose that data via web services. Beyond the standard technical recommendations – there are some common architectural design principles to be aware of when designing and building services – whether they’re internal, enterprise services or public-facing mobile web services.
These principles are well documented throughout the internet, but I found it difficult to find a simple, concise, easily digestible format that wasn’t quickly treading into pie-in-the-sky architecture speak. Careful consideration should be given to these principles early in the design process in order to mitigate the downstream impact of poor architectural decisions. Most of these principles are also beneficial to other areas of object-oriented software development (i.e. loose coupling) – however, the below principles will become readily apparent and relevant when building services.
The six principles are:
Several (autonomy, statelessness) of the six principles also assist in the creation of loosely coupled services. A loosely coupled service requires limited knowledge of the definition or inner workings of other components relied upon by the service. The benefit of loose coupling is the flexibility and agility gained by being able to change the inner workings of a service as needed, without having to make related changes to clients or other components relying upon the service. There are a variety of ways in which services can be tightly coupled – for example: contracts, security, technology, and statefulness.
Coupling via the service contract can occur when the contracts are built against existing logic from back-end systems (i.e. using ORM-generated classes that contain every single database field), which can hinder the evolution of a contract as it has not been designed independently of the underlying logic. Coupling via security or technology can occur when a service is based upon security or communication protocols that limit the adoption of the service – for example, a service built upon an outdated technology (i.e. CORBA or .NET Remoting), or a rarely-used security protocol which may not work across all devices. Coupling via state can occur when service operations must be called in a specific order, coupling the service with the order of operations and session state that must be maintained on the server. Coupling against state can also have a negative impact – for example, a step cannot be added into the middle of a process without updating all service clients.
Service autonomy is a design principle that allows for services to operate more reliably by having maximum control over their execution environment and relying as little as possible on the availability of external resources for which they have no control. Autonomy can be increased by running services on dedicated hardware (reducing dependence on other systems running on the same hardware) or by storing cached copies of data, thus reducing dependence on external databases. Not always possible in every scenario, but good to keep in mind.
Requiring that service operations be called in a specific order increases coupling via an implicit contract – meaning that the order in which the operations should be called is not known and documented by the service itself, but rather must be documented via outside knowledge. Reducing state within services allows for services to be more scalable, as the amount of resources consumed by the service to manage and track state information does not increase with the number of consumers. The most common way to reduce service state is to manage all state at the consumption level – forcing the client to keep track of its own state.
Service consumers should rely only upon a service’s contract to invoke and interact with a service and those interactions should be based solely on the service’s explicit contracts, those which are defined by the service itself, e.g. via a WSDL document – rather than any implicit contracts, for example – external documentation.
Several best practices regarding contracts include:
Service composability is a design principle which encourages services to be designed in a way that they can be consumed by multiple external systems. This promotes the reusability and agility within a service-oriented environment, as it allows new systems to be built by re-using existing services. Composability is further enabled by several of the other design principles including Autonomy (increases reliability of the service such that it can be used in other systems) and Statelessness (allows the services to be used in conjunction with other services without regard for state).
Discoverability is a design principle that encourages services to be discovered more easily by adding metadata about how the service should be invoked and interacted with, and storing that metadata in a central repository if possible. By making the services more easily discoverable and cataloging the related information, the services are inherently more interoperable and can be re-used more easily. The core consideration for this principle is that the information catalogued for the services needs to be both consistent and meaningful. The information recorded about the services should be accessible by both technical and non-technical users, allowing anyone to evaluate the capabilities of the service and whether or not it should be used.
While not all of the services design principles can be followed to the letter in every real-world services implementation, the guidelines can generally be applied regardless of the scenario. The following recommendations should be observed in regards to building well-designed services.
When developing mobile applications, there are a number of key challenges where architecture and design are fundamentally different from that of a typical enterprise application. Careful consideration should be given to these mobile architecture issues early in the development process in order to mitigate the downstream impact of poor architectural decisions. While some of these best practices also make sense for the development of non-mobile applications, many will become more readily apparent when developing on a mobile platform. The five most important areas for consideration, which are detailed throughout this document, include: performance, usability, data access, security, and connectivity.
While more readily apparently in the previous years of mobile development, the computing power available on mobile devices still lags behind desktop and server counterparts and will continue to do so for the foreseeable future due to smaller device footprints and resource constraints. Even the most recent devices still boast only about one third to one half of the computing resources (CPU, RAM) of a low end desktop computer. Further, the quality of data connections available on a mobile device is often highly variable based on signal strength and is far inferior to broadband Internet access in most cases.
Often during rapid application development, performance considerations are ignored until the end of the project and optimized only when necessary. In mobile development, more consideration to performance constraints of the mobile device may need to be given up front in the design process. Each platform has different code-level best practices for performance optimization depending upon the programming language and frameworks available on the platform. Some best practices, such as judicious usage of memory and limits on the number of unnecessary objects created, however, can be applied across all platforms.
Care should especially be given to architectural decisions that can limit performance and are also difficult to change later in the development cycle, such as the design of web service APIs and data formats. General best practices for the design of web service APIs for use in mobile development could be summed up as:
These considerations stem mostly from the limited bandwidth available to mobile devices. If possible, APIs used by a mobile application should be designed to retrieve only the most relevant and useful information – excluding any extra data that is not used by the application. When designing APIs to communicate with mobile applications, one recommendation is to use a lightweight data format like JSON instead of more verbose format such as XML in order to make the best use of limited bandwidth available to mobile devices. The use of a lightweight format like JSON will conserve bandwidth, will allow results to be retrieved more quickly, and also will generally enable faster deserialization of the data as it arrives on the mobile client.
Another important performance consideration on a mobile device is battery life. If an application is constantly polling a web service for updates or continually processing data in the background, the battery will be drained much more quickly. If architecturally feasible (and if the push notification capabilities exist on the mobile platform), the use of push notifications for providing data updates is recommended over periodic polling. Push notification capabilities currently exist on the iPhone, Android, and Windows Phone 7 platforms. If an application needs to perform large amounts of data processing or analysis, consider uploading the necessary data to a server-side platform to perform the CPU-intensive processing and then return the results to the device to avoid draining the battery and to provide a more-responsive user experience.
At the end of the day, usability is one of the key factors that will truly make or break user acceptance of an application. Each of the major mobile platform software vendors (Microsoft, Google, Apple) have released user-experience specifications and guidelines specific to their own platforms in an attempt to foster a consistent look and feel across all applications on their platforms – and if the guidelines are enforced by the vendor and followed by developers, then the payoff is absolutely realized. The user experience across applications on most of the major platforms is seamless – for example, on the more stringent iPhone and Windows Phone 7 platforms, the navigation of menus and the look and feel of most applications (down to the fonts and color schemes) are almost identical. This allows users to learn quickly how to use a new application and instead focus on performing the task at hand, rather than “switching gears” between disparate experiences or puzzling over how to interact with a new application. Below are links to the user experience guidelines for each of the major platforms:
While each platform may have specific user interface (UI) guidelines, the challenges of mobile application usability are ubiquitous and many best practices can be applied across all platforms. Following are a few of the most important usability considerations.
One commonality between the most modern mobile platforms (iPhone, Android, Windows Phone 7) is that none of them offer any capability to connect directly to a database – for good reason. The current mobile architecture paradigm simply doesn’t support this scenario for modern database platforms in their current state. Given that most mobile applications communicate over the public Internet, access to a database would require exposing that database publicly – and in this age, no sane IT or database administrator would publicly expose an instance of Oracle, SQL Server, or MySQL outside the firewall without measures like a VPN or IP restrictions in place. While VPNs are becoming more available on modern mobile platforms, the complexities around cost, bandwidth, and end-user configuration simply don’t make business sense when compared with fronting a database with a more secure web service front-end.
Rather than attempting to provide support for database client connectivity, the current paradigm for data access from mobile applications is based around web services. For the example scenario of extending a common two-tier enterprise application onto a mobile platform, usually a web services layer would first have to be created that would exist in front of the database or APIs of the enterprise application. In the design of a web services layer for a mobile application, logic around authentication, authorization, validation, and business rules should all be executed on the server-side web services of the extended application. As the web services are now exposed publicly for use by any properly authenticated user of your application, the validity of the data and the user’s right to call the web service cannot be trusted without first performing additional server-side checks. Logic for validation and authorization can be duplicated on the mobile client side of the application to provide a more responsive user experience, but the user’s actions should be checked again on the server side after the data is passed to the web service.
The architecture diagram at right below depicts how an enterprise application could be extended onto a mobile platform by wrapping either the application’s APIs or database with a business layer that performs additional processing for validation and security. Note that if validation or authorization is built into the enterprise application’s APIs or data access mechanism, then it is not necessary to re-implement this functionality within the web services layer.
As previously mentioned, data access on mobile platforms generally requires some form of Internet-facing service or data access point that can be communicated with via a mobile device. Database servers and platforms in their current state are not good candidates for public exposure without additional layers of security that are generally not feasible or cost effective on mobile devices. Web servers are generally more hardened to attack and, thus, web services are an excellent candidate for exposure outside the firewall to mobile devices over the Internet. But what about securing these web services?
In most cases, the use of a web service API first requires authentication to ensure that the caller of the web services is who they say they are. Usually, web service API security will use a form of token-based authentication – this could be something like OAuth or as simple as sessions built into any modern server-side framework, such as ASP.NET or Ruby on Rails. In the general workflow of token based authentication, the web service caller sends a username and password and then receives a unique token back after his/her identity has been verified by the authentication service (e.g. LDAP). The token is then passed back to the web service on all subsequent requests and can be used on the server side to determine the identity of the user. Depending upon the security constraints of the application, the token generally expires after a certain period of inactivity. Regardless of the technology used to accomplish the token based authentication, all communication between the mobile client and the web server should be performed over an SSL-secured connection in order to prevent the token from being captured via packet sniffing on a wireless connection or any other “man-in-the-middle” attack. If the token were to be compromised by a third party, the third party would then be able to imitate the identity of the actual application user and would be able to make malicious requests, if inclined.
Another security issue inherent to mobile platforms is the security of data that exists locally on the device itself. Obviously, any mobile device can be compromised much easier than a server residing within a secure data center. If possible, confidential data should not be stored on the mobile device itself and should be stored instead on a back-end server and downloaded to the device when necessary. If for architectural reasons confidential data must be stored on the device, then measures should be taken to encrypt the data with a key that is not stored on the device, if possible. Fortunately, mobile platform vendors are providing more and more support for automatically encrypted disk storage, which makes implementation of secure data storage on the device much easier. One further consideration for mobile data storage security is that highly confidential data, such as private health information (PHI), should not be stored on a mobile device under any circumstances, encrypted or otherwise.
The final major architecture consideration for mobile applications is connectivity. It can no longer be assumed that the application being built will have access to an “always-on” high-speed Internet connection. In the wild, mobile devices will frequently switch between different types of connections (e.g. Edge, 3G, or WiFi) with wildly varying speeds and will often have no data connection. Often, the implementation of offline access for a mobile application simply doesn’t make sense business-wise, architecturally – perhaps the application must have access to only the most relevant and up-to-date data (e.g., traffic conditions), or when data is persisted it must be immediately validated and processed (e.g., stock trades). For most business applications, however, there are use cases for which offline access is absolutely necessary in order to maintain the end user’s productivity. One simple way to design offline access and data synchronization involves the creation of two basic components within the mobile application – a caching mechanism and a queuing mechanism.
The caching component handles offline access for data that would normally be retrieved as needed from a server-side API. The caching component can be designed to periodically (in a background thread) retrieve larger data sets that are potentially relevant to the needs of the end user, or it can be designed to only keep copies of any data previously retrieved from server. Data stored in the cache on the device should generally expire after a certain time period has passed in which the data is no longer useful or relevant. Another feature that can be designed into the caching feature is some level of intelligence related to the current type of connection on the device. For instance, if the cache is designed to periodically download large data sets, then perhaps it will only do so when the device is connected to WiFi in order to conserve bandwidth when connected to slower connection types. The implementation of a caching component can also provide the benefit of a more responsive user experience, as data can then be retrieved from the local cache rather than round-tripping to a server over a slow connection.
The queuing component handles the persistence of data to the back-end services. The queuing component can be designed to sit in front of the web service API client within the mobile application and check to see whether or not a connection is available when attempting to call the web services. If a data connection is unavailable, then the update is placed into a first-in, first-out queue in memory. The queue should then periodically check (in a background thread) to see if a connection is available and then send all data updates to the back-end services in the order in which they were received. The queue should also be designed with business logic around the reconciliation of data conflicts. For example, if a data update is sent to the server and is determined to be out of date or invalid, then the end user should be notified of the error and given a mechanism to correct or discard the update. Another feature that should be designed into the queue is the persistence of the queue to local data storage on the device; if the application is closed or interrupted, then the queued updates will be kept safe until the next time the application is used. Below is a depiction of a mobile architecture using local caching and queuing services to provide offline data access and data synchronization.
As you can see, when designing an application that will be living “in the wild”, outside the corporate firewall, there are numerous challenges that simply don’t exist when building enterprise applications that run in well-known conditions, safe and secure within a corporate datacenter or colocation facility. Performance and usability will make or break the usage and acceptance of any mobile application. Now that users are used to the snappy and responsive interfaces of their modern iPhone, Android, and Windows Phone platforms, they will loathe using any application with a sluggish, unusable interface. Accessing data on a mobile device can be a whole new ball game for enterprise developers who haven’t worked with web services or have spent years writing and maintaining classic two-tier or mainframe-based applications. Security is rarely a concern for developers writing applications that are safely tucked away behind a corporate firewall and intrusion-protection systems, but when exposing APIs with access to business-critical information to the public Internet, there is no way that security-through-obscurity will suffice any longer. Connectivity is especially challenging to design around on a mobile device that will commonly have a very slow connection or no connection at all – for an enterprise application running in a data center, on the other hand, it can usually be assumed there is a redundant, high-availability, high-bandwidth Internet connection available.
In summary, while it can be challenging, there are well known solutions for each of the previously mentioned issues. And though each mobile platform will have its own specific best practices for each area, many of the best practices are standard across all mobile platforms, regardless of the technology used.
West Monroe Partners is a full-service provider of business and technology solutions. Our consulting professionals are experienced in developing enterprise and consumer mobile applications on the Apple iOS, Android, and Windows Phone 7 platforms. Please contact us for more information on how our mobile application development team can work with you to ensure that your mobile applications follow the appropriate best practices. Click here to read more about West Monroe Partners’ Mobile Application Solutions.
Have you ever built a search using a SQL LIKE statement, only to have your users complain about functionality? A simple SQL-based search doesn’t handle synonyms, misspellings, prefixes, suffixes, result rankings, weighting, and so on and so forth. Fret no longer, you can spend a little more time and build a “smart” search using Lucene and get all of these features as well as the ability to tweak the search as much as you like.
Lucene.NET is a direct port of the popular open source Java Lucene project. Large companies such as EMC and Cisco have placed bets on Lucene and embedded the library within some of their products. The .NET version is a little bit behind the Java version in terms of features and releases, but by and large the library is very usable. Lucene can be used to index just about any type of content – including files , database records, web pages, and can be used in any number of architectural scenarios – searching in an ASP.NET web site, searching within a desktop app, search as a web service or Windows service, etc.
In the most simple search scenario – architecturally, you have to build an Indexer and a Searcher. You can think of Lucene as a set of tools that will do most of the work for you in building these components – you have to use Lucene to build an index and dump your searchable content into that index, and you have to tell Lucene how to search the index that you’ve built. Conceptually, the index is built out of the content that you want to search, whether it be files or database records. If you change the content you want to search on (for example, you’ve added a new file), then you have to either append that content to your index or rebuild your index. One strategy is to set up a scheduled process (i.e. using Quartz.NET, a windows service, or scheduled task) to periodically re-index your content.
First things first, you have to add the Lucene libraries to your project. On the Lucene.NET web site, you’ll see the most recent release builds of Lucene. These are two years old. Do not grab them, they have some bugs. There has not been an official release of Lucene for some time, probably due to resource constraints of the maintainers. Use Subversion (or TortoiseSVN) to browse around and grab the most recently updated Lucene.NET code from the Apache SVN Repository. The solution and projects are Visual Studio 2005 and .NET 2.0, but I upgraded the projects to Visual Studio 2008 without any issues. I was able to build the solution without any errors. Go to the bin directory, grab the Lucene.Net dll and add it to your project.
Step two is building your searchable index. A Lucene index is usually stored as a set of files on the file system, but can also be stored in memory for performance – and there are even proof of concept projects available that allow you to store the index in a database (though I’m not sure why you would).
A couple of Lucene concepts/classes you should be aware of for indexing include Documents, Fields, Analyzers, and the IndexWriter. Documents are what you put into your index. They’re not “documents” in the traditional sense, like a Word document – rather, a Document is just an abstraction of an indexable piece of content. It is your responsibility to create the Document objects to place into your Index.
For example, let’s say we’re creating a product search, using Product objects pulled from our database. Our searches will be based on the Product Name.
public class Product
{
public Product() { }
public string ProductName { get; set; }
public decimal Price { get; set; }
public string Color { get; set; }
public string Id { get; set; }
//return a Lucene document for the product
public Document GetDocument()
{
Document document = new Document();
document.Add(new Field("ProductName", this.ProductName, Field.Store.NO, Field.Index.ANALYZED));
document.Add(new Field("Id", this.Id.ToString(), Field.Store.YES, Field.Index.NO));
return document;
}
}
We’ll add fields to our Document to represent the values we want to search on or store in our Index. Field.Store.YES/NO indicates whether or not we want to actually store the field in our index. Note how I don’t store the Price or Color columns – we don’t want to store the complete objects in Lucene – it’s just our search index. Keep the complete objects stored in your database (or keep your files on the file system, etc.). We do want to store the Id because when we get our search result documents back from querying the index only the stored fields will be returned . We need to at least know the Product Id so we can go fetch our full objects that match our search results from the database. There is also a COMPRESS option that you can use if you need to store large fields or binary data.
Field.Index.ANALYZED/NO indicates whether or not we want to actually index the field. Indexing a field takes some minimal level of processing power, so we don’t want to index every field – only index what you want to search on. Thus we don’t want to Index the Product Id, Color, or Price – only the Name because that’s all we want to search on.
Next, we’ll create the index and add the documents to it. Below is an example of a very simple class with a single method that we can use to build our Product search index using a given list of products.
public class Index
{
public void BuildIndex(List<Product> products)
{
FSDirectory directory = FSDirectory.Open(new System.IO.DirectoryInfo("C:\\temp\\"));
Analyzer analyzer = new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29);
IndexWriter indexWriter = new IndexWriter(directory, analyzer, true, IndexWriter.MaxFieldLength.UNLIMITED);
foreach (Product product in products)
{
indexWriter.AddDocument(product.GetDocument());
}
indexWriter.Optimize();
indexWriter.Close();
}
}
The FSDirectory is just an abstraction of the storage of the index, and there are “directory” classes that represent in-memory storage, etc. that you can use as well. You can pass a DirectoryInfo object to the Open method to specify where to store the search index.
The Analyzer’s job is to parse, tokenize, and index your data. There are a number of different Analyzers implemented in Lucene, but the StandardAnalyzer is the most straightforward. The StandardAnalyzer will do a few things to your text – including removing junk search terms (aka “stop words”) and punctuation, and normalizing the case of your text. There are a number of constructors available for the StandardAnalyzer, and you can specify your own stop words if you like, but there is a list of common stop words built into Lucene. There is another good analyzer available called the SnowballAnalyzer, which will remove suffixes and prefixes from your text, which can greatly improve your search results. The SnowballAnalyzer is a separate Lucene project that is outside of the main source code, it can be found under the contrib folder in the Lucene source (not in the main Lucene.Net solution) – build it yourself and include it in your project if you would prefer to use it instead of the StandardAnalyzer.
The IndexWriter is responsible for creating the index. The IndexWriter is actually thread safe, and an index can be rebuilt while being read from at the same time without you having to manage the locking of the index files. Lucene takes care of that for you. There is a boolean parameter on the constructor that indicates whether or not to recreate or append to the index. Simply call the AddDocument method on the IndexWriter to write documents to the index. When you’re finished writing documents to the index, you must call the Close method. Optionally, you can call the Optimize method before closing the index which will greatly shrink the size of the index – however, this can take a few seconds sometimes so you may not want to call Optimize if you have indexing performance concerns.
Now that we have the Index built, we can move on to actually searching the index…
Below is an example method that you could use to search your newly created product search index, you could potentially add it into your Index class. You’ll see a few of the same classes from the indexing sample being used in the search method. As in the previous example, you’ll use the FSDirectory class to specify where the index is located. Then, you’ll need to create an IndexReader, passing in your directory object. The second parameter of the IndexReader specifies whether or not to open the index in read-only mode – for our simple purposes, we only need to read from the index. One thing to note about the IndexReader is that it is fairly expensive to create, so you don’t want to create one every time you’re doing a search in your web application for example. Create a single IndexReader – perhaps in a singleton pattern or by caching the IndexReader object, and re-use that IndexReader. Next, we need an IndexSearcher to actually search our index, fairly straightforward.
When searching, the search queries must be parsed and tokenized in the same way that the data was parsed when it was placed into the index. Due to this, one very important thing to note is that when searching, the same type of Analyzer that was used to create the index must also be used to parse the search queries. If a StandardAnalyzer is used to create the index, a StandardAnalyzer must also be used to parse search queries against the index. The QueryParser actually parses the query text against the field that is going to be searched against – as you can see in the QueryParser constructor, we’ll be searching against the “ProductName” field from our documents. After that, simply call the Parse method on the QueryParser to get the Query that we’ll pass to the searcher. To note, if you want to search on multiple fields – say we wanted to search on the Product Name and the Color, you can use the MultiFieldQueryParser class to query against multiple fields. With the MultiFieldQueryParser, you can even do some clever things like weighting fields differently, i.e. if I wanted product name matches to rank higher than color matches.
Next, we’ll create a collector that will define how the search results are collected from the searcher – we’ll use a TopScoreDocCollector. The first parameter is the maximum number of results, and the second parameter determines whether or not the results are sorted in order of search relevancy. For our purposes, we want to show the customers the best results for their search query so we’ll obviously want our results sorted in order. From there, simply call the Search method on the searcher, passing in the query and the document collector and receive a collection of scored matches based on the search query. For each match, you can call the .Doc method on the searcher to retrieve the actual full Document that was placed in the Index originally. After I’ve collected up the Product IDs from the search result documents, I go back and fetch the full Product objects from the database. Depending on what fields you choose to store in your Lucene index, you may not need to re-fetch what you’re searching for from the database. It’s a good idea to store only enough data to display the search results, that way you don’t need to make a trip to the database just to display your search results.
public List<Product> SearchProductName(string productName)
{
FSDirectory directory = FSDirectory.Open(new System.IO.DirectoryInfo("C:\\temp\\"));
IndexReader reader = IndexReader.Open(directory, true);
Searcher searcher = new IndexSearcher(reader);
Analyzer analyzer = new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29);
QueryParser parser = new QueryParser(Lucene.Net.Util.Version.LUCENE_29, "ProductName", analyzer);
Query query = parser.Parse(productName);
TopScoreDocCollector collector = TopScoreDocCollector.create(100, true);
searcher.Search(query, collector);
ScoreDoc[] hits = collector.TopDocs().scoreDocs;
List<int> productIds = new List<int>();
foreach (ScoreDoc scoreDoc in hits)
{
//Get the document that represents the search result.
Document document = searcher.Doc(scoreDoc.doc);
int productId = int.Parse(document.Get("Id"));
//The same document can be returned multiple times within the search results.
if (!productIds.Contains(productId))
{
productIds.Add(productId);
}
}
//Now that we have the product Ids representing our search results, retrieve the products from the database.
List<Product> products = ProductDAO.GetProductsByIds(productIds);
reader.Close();
searcher.Close();
analyzer.Close();
return products;
}
Again, keep in mind this is only an example method. The examples above are based around searching rows that live in a database, but they could be easily adapted to searching through a directory of files, or searching through indexed web pages. The Lucene class structure, to me seems highly abstracted – this is to allow for ultimate flexibility. Search is a finicky thing and you’ll always run into scenarios where your client doesn’t like the way the search works – that’s fine, because Lucene gives you the flexibility to change how the search works.
I was recently trying to ressurect an older project developed in Windows XP, .NET Framework 2.0, Visual Studio 2005, NHibernate, and SQL Server CE 3.1. I’ve sinced moved to Windows 7 (64-bit) and Visual Studio 2008.
I ran into a surprising number of hurdles while trying to get the application up and running again on 64-bit Windows 7. I figure I would document this here, just in case anyone else runs into the same issues.
Step 1) Try to build the solution. Everything builds fine after installing SQL Server Compact Edition.
Step 2) Try to run the application. Get an exception immediately:
“Could not create the driver from NHibernate.Driver.SqlServerCeDriver.”
InnerException:
“The IDbCommand and IDbConnection implementation in the assembly System.Data.SqlServerCe could not be found. Ensure that the assembly System.Data.SqlServerCe is located in the application directory or in the Global Assembly Cache. If the assembly is in the GAC, use <qualifyAssembly/> element in the application configuration file to specify the full name of the assembly.”
Turns out the issue here is that the System.Data.SqlServerCe dll has to be in the same folder as the application executable. Pretty easy fix – set Copy Local to ‘True’ on the reference to System.Data.SqlServerCe.
Step 3) Run the application again – now I get a different exception:
“Unable to load DLL ‘sqlceme35.dll’: The specified module could not be found. (Exception from HRESULT: 0x8007007E)”
Turns out the issue with this exception is that SQL Server Compact Edition is built for x86 and has to run in WoW mode on x64 systems. My solution platform is set to ‘Any CPU’, which worked fine when I was developing on Windows XP. To fix the issue, go through all of the Visual Studio projects – go to Properties > Build > Platform Target, and set Platform Target to ‘x86′ instead of ‘Any CPU’.
Step 4) Try to run the application again… and I get yet another exception:
“ADOException: cannot open connection” with InnerException of:
“The database file has been created by an earlier version of SQL Server Compact. Please upgrade using SqlCeEngine.Upgrade() method.”
This is kind of annoying – the Visual Studio 2008 Upgrade Wizard changed all my references from SQL Server CE 3.1 to SQL Server 3.5. How thoughtful. Unfortunately, I don’t know what the implications of ‘upgrading’ the database are. Everything worked fine with 3.1 – why introduce any more change to the application? So, I set the references back to SQL Server CE 3.1 instead of 3.5.
Step 5) Run the application… again.
No exceptions! Everything works with SQL Server 3.1! Upgrade complete.
One thing that any web developer worth their salt should know is the basics of search engine optimization (SEO). Much of SEO comes down to basic code-level best practices, and it isn’t terribly difficult to simply bake SEO into your development process when working on public facing web applications. However, keep in mind that SEO will always be an evolving, fuzzy science, changing on the whim of the indexing strategies of major search engines. Immediate results are rare, and a long term process should be in place to truly understand the benefit (or detriment) incurred.
I break the concept of SEO down into a few categories that I’ll explain further below…
These ‘Content / Internal’ best practices are things that a developer or content creator can bake in during the site development process. Only a few of these items will make a difference on their own, but as a whole can make an enormous impact. These basic factors should lay the foundation for any SEO strategy. However, these internal factors absolutely cannot be the only part of your SEO strategy. Here are a few of the most important ones…
Strategic SEO includes all of the factors external to your website that can affect your search engine rankings. The number one external factor is getting ‘backlinks’ to your content, this is what made Google so ridiculously powerful and accurate – and their rankings are still very much based on the number, diversity, and quality of links to to your site.
Backlinking can be explained with this anecdote: Several years ago you could search for ‘Miserable Failure’ on Google and the number one result was the White House biography page for George Bush. This was due to a simple viral campaign to get people to put links on their websites, comments, blog posts, etc. linking to the biography page with the anchor text ‘Miserable Failure’. That’s how backlinks work. The more external, inbound links to your site, the more ‘authoritative’ your site appears to be in the eyes of major search engines.
But how can you get these backlinks? A few examples…
As mentioned previously, part of SEO includes a process testing out your SEO changes and tracking their effectiveness over time. A variety of free and paid tools are available to assist you in analyzing your search rankings, search terms, and keyword effectiveness. Below I’ve listed a few tools that can help.
There is much more to search engine optimization than can be written up in a single blog post (see also: thousands of blogs dedicated purely to the subject). However, I hope this quick guide to the basics will give you the tools necessary to implement numerous high impact SEO quick wins for a client or personal web site. For web developers, the factors listed above should be kept in mind whenever developing customer-facing websites that could benefit from enhanced search results and search rankings. Most of the ‘content / internal’ best practices can be easily baked into the development process of almost any e-commerce or content management system implementation project.
The impact of performance is much more readily apparent in .NET Compact Framework applications. The mobile devices commonly have a CPU that is 10 times slower than your desktop CPU, and possibly up to 100 times less RAM than a desktop or server. In Agile or XP development, the mantra is often to ignore performance considerations until necessary – I don’t think you can apply that to .NET CF development or it will really bite you in the end. You don’t have to go nuts and optimize everything up front, but there are some very important things to keep in mind when developing a Windows Mobile application…
Many of the standard .NET Framework performance best practices can become apparent very quickly including…
However, the .NET Compact Framework is different than the full framework in many ways, leading to a slew of .NET CF specific performance considerations…
I’m starting up a short Windows Mobile project again, so I thought it would be a good time to collect some of my best practices for .NET Compact Framework development and post them. I’m going to break them down into two sections - usability, and performance best practices (in another post).
Microsoft has put together a very specific set of guidelines for Windows Mobile usability – the point of this is to get a consistent set of look and feel and application experiences on their platform. Apple has the same sort of guidelines for iPhone development and it really pays off – most applications have the same consistent look and feel and excellent usability. Of course, many of these usability guidelines are relevant across many development platforms, but there are some special considerations for mobile development.
Usability is a challenge in mobile development. Some of the main concerns include…
Here are some of the most important usability guidelines that Microsoft has set forth…
Over time while using ASP.NET I’ve collected a pretty good handful of best practices that I try to employ on my projects – most of them are things that will simplify the ASP.NET development experience, solutions to common problems, or tips that will just make your life easier. Most of the best practices are only applicable to WebForms, but some are applicable to ASP.NET MVC as well.
set { ViewState["SampleId"] = value; }
}
Ran into an interesting problem yesterday where a few months ago we helped a client redesign an ASP.NET web application to fit it into an iframe within their CMS rather than being a standalone site. Easy enough task. Testing is completed and site is rolled out.
Now, several months down the road after the application has been iframe’d and in production – one random feature of the application is unexpectedly breaking, but it doesn’t make any sense – the only way the behavior could possibly occur would be that an object retrieved from Session is coming back as null, which turned out to be the case. The browser was somehow losing the ASP.NET Session cookie. Furthermore, the feature was working fine in Firefox but not in Internet Explorer, very strange.
The problem was that Internet Explorer will not accept cookies from a page within an iframe where the domain name is different from the top level page. So, the url of the iframe’d page was www.clientsite1.com and the url of the page hosting the iframe was www.clientsite2.com.
To get around this, you need to add a P3P Compact Policy to your HTTP responses. P3P is a protocol that allows websites to pass information to the browser regarding their intent to use information collected from the user. Internet Explorer is the only browser that implements the protocol, and only using it for cookie blocking at that.
To add a P3P in ASP.NET that will allow your cookies to be accepted by the browser from a different domain from within an iframe, add this block of code to your Global.asax.

Categories
Tag Cloud
Blog RSS
Comments RSS


Void « Default
Life
Earth
Wind
Water
Fire
Light 