Lua bindings for Xapian

The Lua bindings for Xapian are packaged in the xapian namespace, and largely follow the C++ API, with the following differences and additions.

These bindings require Lua5.1 or later version.

The examples subdirectory contains examples showing how to use the Lua bindings based on the simple examples from xapian-examples: simpleindex.lua, simplesearch.lua, simpleexpand.lua.

Unicode Support

In Xapian 1.0.0 and later, the Xapian::Stem, Xapian::QueryParser, and Xapian::TermGenerator classes all assume text is in UTF-8. A Lua string is an aribitrary sequence of values which have at least 8 bits (octets); they map directly into the char type of the C compiler. Lua does not reserve any value, including NUL. That means that Lua can store a UTF-8 string without problems.

Exceptions

Exceptions thrown by Xapian are translated into Lua Exception objects which are thrown into the Lua script.

Iterators

All iterators support next and equals methods to move through and test iterators (as for all language bindings). MSetIterator and ESetIterator also support prev. As "end" is keyword in Lua, we rename it to "_end" that means the end of the iterator. The following shows an exmple that iterators the mset to get the rank, percent, id and data of the document in the mset.

function msetIter(mset)
	local m = mset:begin()
		return function()
		if m:equals(mset:_end()) then
			return nil
		else
			local rank = m:get_rank()
			local percent = m:get_percent()
			local id = m:get_docid()
			local data = m:get_document():get_data()
			m:next()
			return rank, percent, id, data
		end
	end
end

Iterator dereferencing

C++ iterators are often dereferenced to get information, eg (*it). With Lua these are all mapped to named methods, as follows:

IteratorDereferencing method
PositionIterator get_termpos
PostingIterator get_docid
TermIterator get_term
ValueIterator get_value
MSetIterator get_docid
ESetIterator get_term

Other methods, such as MSetIterator:get_document, are available under the same names.

MSet

MSet objects have some additional methods to simplify access (these work using the C++ array dereferencing):

Method nameExplanation
get_hit(index)returns MSetItem at index
get_document_percentage(index)convert_to_percent(get_hit(index))
get_document(index)get_hit(index):get_document()
get_docid(index)get_hit(index):get_docid()

The C++ API contains a few non-class functions (the Database factory functions, and some functions reporting version information), which are wrapped like so for Lua:

Constants

For Lua, constants are wrapped as xapian.CONSTANT_NAME or xapian.ClassName_CONSTANT_NAME. So Xapian::DB_CREATE_OR_OPEN is available as xapian.DB_CREATE_OR_OPEN, Xapian::Query::OP_OR is available as xapian.Query_OP_OR, and so on.

Query

In C++ there's a Xapian::Query constructor which takes a query operator and start/end iterators specifying a number of terms or queries, plus an optional parameter. In Lua, it is wrapped to accept Lua tables to give the terms/queries, and you can specify a mixture of terms and queries if you wish. For example:

   subq = xapian.Query(xapian.Query_OP_AND, {"hello", "world"})
   q = xapian.Query(xapian.Query_OP_AND, {subq, "foo", xapian.Query("bar", 2)})

MatchAll and MatchNothing

These aren't yet wrapped for Lua, but you can use XapianQuery("") instead of MatchAll and XapianQuery() instead of MatchNothing.

Enquire

There is an additional method get_matching_terms which takes an MSetIterator and returns a list of terms in the current query which match the document given by that iterator. You may find this more convenient than using the TermIterator directly.

Last updated $Date$