pure negation / stop words search limitations

In the case of a search service disruption the status updates will be shown here.
Post Reply
slotboxed
Posts: 57
Joined: Sun Nov 09, 2003 3:49 am

pure negation / stop words search limitations

Post by slotboxed »

I am finding that some patterns cannot be searched for.

I want to find all posts in a certain newsgroup (which has a lot of german posts) which don't include the word german, but

^"german" search failed

Also, I get too many results when searching for some words, so I tried
"searchword"&".par2" but that yields the same results, so then I tried, by themselves:

".par2" search failed

".rar" search failed

".zip" works

Is this kind of failure due to there being too many results for a given search term?
alex
Posts: 4558
Joined: Thu Feb 27, 2003 5:57 pm

Post by alex »

Yes, those kinds of searches are not supported and they are very problematic in principle at the extent what is currently possible. It is expensive to use indexing to download unspecifically.

Try also to imagine how it is possible to implement "^german" search pattern. The only practical way is to exhaust all possibilities, thus the search server would need at worst to compare say 100 million of records against the search pattern.

With too frequent words as to stop words (when there are too many matches), it also in a way may need excessive CPU usage, since it may lead to a kind of linear search as well.

If linear search worked for indexing, you would see a lot of search engines, which just add all headers into a database then perform search by applying the search pattern to all records, but in practice with current CPU speeds versus the number of records it will be too slow.

Best for such matches you should use a usenet provider which supports compressed headers, then you can download headers in the newsgroups and then use quick filter to look for what you need.
dengle
Posts: 274
Joined: Mon Jun 30, 2003 2:37 pm

Post by dengle »

I'm not sure if you're quoting for use on the forums or not, but don't use them in the search.

For example, if you're looking for episodes of Family Guy, a good search could be:

Family?Guy&^german

or

Family&Guy&^german

EDIT: Disregard as I just reread your post. I defer to Alex' comments :-)
alex
Posts: 4558
Joined: Thu Feb 27, 2003 5:57 pm

Re: pure negation / stop words search limitations

Post by alex »

i've "improved" the behaviour a bit.

now if the subject pattern is well defined - the author pattern can be anything and vice versa.

it is equivalent to making a search without the stop word or negation and then applying the quick filter in UE, now one can do it on the server side.
Post Reply