Wednesday, October 29, 2008

Static File Servers and Drupal

Any call to a Drupal site has two aspects associated with it:

The Dynamic HTML that gets created on access of the website + the static elements (like css, js and other images) that are embedded in it.

A request to the a website from a browser = Dynamic HTML call to the web server + one call each for every associated static element in the HTML to the same web server.

The amount of memory required to serve a Dynamic request is much more than that of the static file requests. More over in Apache the same memory will be utilized to serve static files as well. To avoid this we can place all static files in a separate Web server and configure the this server to handle static requests. This will greatly improve the performance and scalability of a Drupal website.

A Drupal site with lots of modules installed that handles a lot of data from the database can easily require 64M of memory per thread. This is a huge expenditure of memory compared to the 1-2M it takes to serve a static file. Since Apache recycles its worker threads, you end up in a situation where the same 64M monster that created the Drupal HTML is also used for serving a .jpg file. This is a huge waste of resources.


For a private server the needs are different. The static file become applicable for the images and css and js files associated with HTML but for the main application we need to use Private files. Private files makes the file handling process itself dynamic, where in we will require to use PHP to identify permissions and the location of the files. This is handled by a separate menu call back in Drupal. We can not use static files for this purpose.

In this case we are looking at three approaches:

1. To handle files to be placed in any physical directory (not in the root)within the same server

2. To handle files to be placed in any physical directory (not in the root) in a different server - here we are using the FTP protocol to access and place files in the file server. This is a custom code written by us.

3. To handle files to be placed in any physical directory (not in the root) in a different server - using Curl

Questions that arise - Which of these are scalable? Which is most secure?

Wednesday, October 15, 2008

The Semantic web and Search

The semantic web is a vision of information that is understandable by computers, so that they can perform more of the tedious work involved in finding, sharing and combining information on the web, closer to how a user would look for information.

Achieving this would indeed awesome. Imagine the power on our searches. With all our websites having so much of information how the user could benefit from Semantic search. Dries article and what the SearchMonkey video demonstrates sums it all up! The trend in Searches today, read on ...

Here is what Dries Buytaert says about Drupal, the semantic web and search
What if instead of having a custom indexer designed just for Pivots we had a rich indexer with lots of meta information and semantic tagging.

Here is another example. Imagine a standard Drupal node-type called 'job'. The fields in the job node-type would have RDF properties associated with them mapping to salary, duration, industry, location, and so on. Creating a new job posting on a Drupal site would generate RDFa that semantic search engines like Yahoo!'s SearchMonkey would pick up and the job would be included in their world-wide job database. Read more...

SearchMonkey is Yahoo! Search's new open platform.

Using SearchMonkey, developers and site owners can use structured data to make Yahoo! Search results more useful and visually appealing, and drive more relevant traffic to their sites. Read more...

Sunday, October 12, 2008

Nagarajan in Drupal Chennai

I am proud to announce that my colleague Nagarajan has started a Drupal Blog. He has expert knowledge in various aspects of Drupal! For a very informative nice article on Performance improvement guidelines for Drupal sites, visit:

Friday, October 10, 2008

Parameters in Apache Solr Schema.xml

Indexing using Apache Solr:

In the Schema.xml we can create various datatypes. Every datatype can be associated with one and only one Analyzer. The no of Data types and the associated Analyzers define how we want to index the content. We can have one Analyzer for each of the columns in our Database!

1. The main DATA CLASSES:

1.1 Text

Class: Perdefined javaclasses to define the content datatype

Sortmissinglast = “true” (a sort on this field will cause documents
without the field to come after documents with the field )

omitNorms= “true”
omitNorms is set to true to omit the norms associated with this field (this disables length normalization and index-time boosting for the field, and saves some memory). Only full-text fields or fields that need an index-time boost need norms.

1.2 Numeric:

class="solr.IntField" OR

if you place a text inside a int field in the sortable type it will converted to an integer and can be sorted.

1.3 Date:

Format for date field :
1995-12-31T23:59:59Z trailing "Z" designates UTC time and is mandatory

Reference: ://

You can perform operations on the date field and store in the database.
... Round to the start of the current hour
... Exactly 1 day prior to now
... 6 months and 3 days in the future from the start of the current day

Date Field details refer javadocs, probable use case is for time based faceted search.

3. Analyzers: Tokenizers and Tokens

If you want different columns in your database to use different Tokenizers, they must be associated with different data types in Solr. Over and above the tokenizers, the text can be further indexed using the Token filters.We an also have predefined analyzer classes in java and then just include them, they are:

BrazilianAnalyzer, ChineseAnalyzer, CJKAnalyzer, CzechAnalyzer, DutchAnalyzer, FrenchAnalyzer, GermanAnalyzer, GreekAnalyzer, KeywordAnalyzer, PatternAnalyzer, PerFieldAnalyzerWrapper, QueryAutoStopWordAnalyzer, RussianAnalyzer, ShingleAnalyzerWrapper, SimpleAnalyzer, SnowballAnalyzer, StandardAnalyzer, StopAnalyzer, ThaiAnalyzer, WhitespaceAnalyzer

Defining Custom analyzers is a combination of Tokenizers and Tokens.

2. PositionIncrementGap:

A position increment gap controls the virtual space between the last token of one field instance and the first token of the next instance. With a gap of 100, this prevents phrase queries (even with a modest slop factor) from matching across instances.

Which Tokenizer do I use? Which Token filters should I apply? How should I create my Analyzers using Tokenizers and Token filters? These questions depend on the business rule of the Search engine.

Sunday, October 5, 2008

Drupal in 2001

This is how Drupal looked in 2001, it was called Drop ("dorp" - village in Dutch) then!

Search This Blog

Chennai Drupal Community - Community plumbing

Shyamala's Drupal SEO