Indexing a billion pages

It’s two years since we launched Mwmbl, the open source, non-profit search engine, on Boxing Day 2021. A good time to take stock of where we are and where we’re going. We’ve indexed over 100 million pages Thanks to our volunteers, who crawl the web using the Firefox extension and command line script, we’re crawling up to a million pages a day, as you can see on our stats page. There are around 50-60 users crawling on an average day....

December 23, 2023 · Daoud Clarke

Why is curation of web search results important?

Mwmbl is the first search engine to allow users to change the search results: You can add results, delete them, and rerank them. The changes you made are saved instantly to the index and will be shown to other users who run the same query. But what is the point of users changing search results? There are far too many queries to expect them all to be curated by users....

November 30, 2023 · Daoud Clarke

We are entering a new era of web search

We recently launched the new version of Mwmbl which includes the long-awaited feature of allowing users to curate search results. This is an experiment, since we don’t know: Will people want to curate search results? How will we determine what is an objectively good search ranking? How will we deal with and prevent spam? How will we build and manage the community? Can we use curated search results as training for learning to rank?...

November 29, 2023 · Daoud Clarke

Mwmbl Update - Over 100 million pages indexed

Highlights: We now have 105 million pages in our index We’re crawling around a million pages a day Around 60 people are helping to crawl the web each day The beta version of Mwmbl now allows user curation of search results Side projects My main problem with side projects is that I tend to stop working on them. It feels like I’ve started so many things only to abandon them, and that makes me feel bad....

November 1, 2023 · Daoud Clarke

Fall 2022 Update

The Mwmble team took a bit of a great since August to work on some other areas of life, but we have been quietly planning the last several months around a couple of areas: Editing search results The most requested feature is to be able to suggest sites to crawl. We’re planning to go one better and let users edit the whole search results ranking and add their own search results....

November 7, 2022 · John Marcellus

Why the world needs a non-profit search engine

Originally published at daoudclarke.net on 2022-07-10 Sometimes I forget why I’ve taken on this crazy, huge task. Why am I building a search engine? Will it really be better than Google one day? Will people support it? Will people even use it? And then I read something like The Bullshit Web and I remember, that, yes, there is a point. Even if I make the web better for one person, it’s worth it....

July 10, 2022 · Daoud Clarke