Search Engine
This is a search Engine rank by (bm25, vsm, dirichlet)
How to Run:
# please run code in windows, install requirements.txt first
please make sure all input path and output path are correctly
1. building the index
python run.py [trec-files-directory-path] [index-type] [output-dir]
eg.:
python run.py ./BigSample/ single ./indexes/my-indexes/
python run.py ./BigSample/ stem ./indexes/my-indexes/
python run.py ./BigSample/ phrase ./indexes/my-indexes/
python run.py ./BigSample/ positional ./indexes/my-indexes/
outputType:
{indexType}Lex.json to store inverted index
{indexType}Term.json to store termList
## make sure run 1. to build index first or below will raise exception
2. static query processing
python run.py ./indexes/my-indexes/ ./data/queryfile.txt bm25 stem ./results/results-bm2
python run.py ./indexes/my-indexes/ ./data/queryfile.txt dirichlet stem ./results/results-bm2
python run.py ./indexes/my-indexes/ ./data/queryfile.txt vsm stem ./results/results-bm2
python run.py ./indexes/my-indexes/ ./data/queryfile.txt bm25 single ./results/results-bm2
python run.py ./indexes/my-indexes/ ./data/queryfile.txt dirichlet single ./results/results-bm2
python run.py ./indexes/my-indexes/ ./data/queryfile.txt vsm single ./results/results-bm2
in this project, the code will automatically run "treceval.exe qrel.txt results.txt" after scores are calculated
to test the result for other dataset, change qrelfile, run below instead:
-- treceval.exe [qrel.txt] [results.txt]
3. dynamic query processing
python run.py ./indexes/my-indexes/ ./data/queryfile.txt ./results/results-bm25
in this project, the code will automatically run "treceval.exe qrel.txt results.txt" after scores are calculated
to test the result for other dataset, change qrelfile, run below instead:
-- treceval.exe [qrel.txt] [results.txt]
use command (sh test.sh) to see all test case
command output are stored in testcaseResult.txt