By many measures, Mwmbl is doing great. We have indexed over half a billion pages, we have over 4,000 registered users, and over 30,000 curations from those users. Our volunteers are crawling around 5 million pages a day. But the score that I care about most right now is NDCG. This measures the quality of our search results against a “gold standard” which is just Bing search results for the same query....
Indexing a billion pages
It’s two years since we launched Mwmbl, the open source, non-profit search engine, on Boxing Day 2021. A good time to take stock of where we are and where we’re going. We’ve indexed over 100 million pages Thanks to our volunteers, who crawl the web using the Firefox extension and command line script, we’re crawling up to a million pages a day, as you can see on our stats page. There are around 50-60 users crawling on an average day....
Why is curation of web search results important?
Mwmbl is the first search engine to allow users to change the search results: You can add results, delete them, and rerank them. The changes you made are saved instantly to the index and will be shown to other users who run the same query. But what is the point of users changing search results? There are far too many queries to expect them all to be curated by users....
We are entering a new era of web search
We recently launched the new version of Mwmbl which includes the long-awaited feature of allowing users to curate search results. This is an experiment, since we don’t know: Will people want to curate search results? How will we determine what is an objectively good search ranking? How will we deal with and prevent spam? How will we build and manage the community? Can we use curated search results as training for learning to rank?...
Mwmbl Update - Over 100 million pages indexed
Highlights: We now have 105 million pages in our index We’re crawling around a million pages a day Around 60 people are helping to crawl the web each day The beta version of Mwmbl now allows user curation of search results Side projects My main problem with side projects is that I tend to stop working on them. It feels like I’ve started so many things only to abandon them, and that makes me feel bad....
Fall 2022 Update
The Mwmble team took a bit of a great since August to work on some other areas of life, but we have been quietly planning the last several months around a couple of areas: Editing search results The most requested feature is to be able to suggest sites to crawl. We’re planning to go one better and let users edit the whole search results ranking and add their own search results....
Why the world needs a non-profit search engine
Originally published at daoudclarke.net on 2022-07-10 Sometimes I forget why I’ve taken on this crazy, huge task. Why am I building a search engine? Will it really be better than Google one day? Will people support it? Will people even use it? And then I read something like The Bullshit Web and I remember, that, yes, there is a point. Even if I make the web better for one person, it’s worth it....