uva1597

maksyuki 发表于 oj 分类，标签: 字符串-简单字符处理, 数据结构-二叉树

24 1月 2018

The word “search engine” may not be strange to you. Generally speaking, a search engine searches the web pages available in the Internet, extracts and organizes the information and responds to users’ queries with the most relevant pages. World famous search engines, like GOOGLE, have become very important tools for us to use when we visit the web. Such conversations are now common in our daily life:

“What does the word like ∗ ∗ ∗ ∗ ∗∗ mean?”

“Um. . . I am not sure, just google it.”

In this problem, you are required to construct a small search engine. Sounds impossible, does it? Don’t worry, here is a tutorial teaching you how to organize large collection of texts efficiently and respond to queries quickly step by step. You don’t need to worry about the fetching process of web pages, all the web pages are provided to you in text format as the input data. Besides, a lot of queries are also provided to validate your system.

Modern search engines use a technique called inversion for dealing with very large sets of documents. The method relies on the construction of a data structure, called an inverted index, which associates terms (words) to their occurrences in the collection of documents. The set of terms of interest is called the vocabulary, denoted as V . In its simplest form, an inverted index is a dictionary where each search key is a term ω ∈ V . The associated value b(ω) is a pointer to an additional intermediate data structure, called a bucket. The bucket associated with a certain term ω is essentially a list of pointers marking all the occurrences of ω in the text collection. Each entry in each bucket simply consists of the document identifier (DID), the ordinal number of the document within the collection and the ordinal line number of the term’s occurrence within the document.

Let’s take Figure-1 for an example, which describes the general structure. Assuming that we only have three documents to handle, shown at the right part in Figure-1; first we need to tokenize the text for words (blank, punctuations and other non-alphabetic characters are used to separate words) and construct our vocabulary from terms occurring in the documents. For simplicity, we don’t need to consider any phrases, only a single word as a term. Furthermore, the terms are case-insensitive (e.g. we consider “book” and “Book” to be the same term) and we don’t consider any morphological variants (e.g. we consider “books” and “book”, “protected” and “protect” to be different terms) and hyphenated words (e.g. “middle-class” is not a single term, but separated into 2 terms “middle” and “class” by the hyphen). The vocabulary is shown at the left part in Figure-1. Each term of the vocabulary has a pointer to its bucket. The collection of the buckets is shown at the middle part in Figure-1. Each item in a bucket records the DID of the term’s occurrence.

After constructing the whole inverted index structure, we may apply it to the queries. The query is in any of the following formats:

term

term AND term

term OR term

NOT term

A single term can be combined by Boolean operators: ‘AND’, ‘OR’ and ‘NOT’ (‘term1 AND term2’ means to query the documents including term1 and term2; ‘term1 OR term2’ means to query the documents including term1 or term2; ‘NOT term1’ means to query the documents not including term1). Terms are single words as defined above. You are guaranteed that no non-alphabetic characters appear in a term, and all the terms are in lowercase. Furthermore, some meaningless stop words (common words such as articles, prepositions, and adverbs, specified to be “the, a, to, and, or, not” in our problem) will not appear in the query, either.

For each query, the engine based on the constructed inverted index searches the term in the vocabulary, compares the terms’ bucket information, and then gives the result to user. Now can you construct the engine?

Input

The input starts with integer N (0 < N < 100) representing N documents provided. Then the next N sections are N documents. Each section contains the document content and ends with a single line of ten asterisks.

**********

You may assume that each line contains no more than 80 characters and the total number of lines in the N documents will not exceed 1500.

Next, integer M (0 < M ≤ 50000) is given representing the number of queries, followed by M lines, each query in one line. All the queries correspond to the format described above.

Output

For each query, you need to find the document satisfying the query, and output just the lines within the documents that include the search term (For a ‘NOT’ query, you need to output the whole document). You should print the lines in the same order as they appear in the input. Separate different documents with a single line of 10 dashes.

----------

If no documents matching the query are found, just output a single line: ‘Sorry, I found nothing.’. The output of each query ends with a single line of 10 equal signs.

==========

Sample Input

4
A manufacturer, importer, or seller of
digital media devices may not (1) sell,
or offer for sale, in interstate commerce,
or (2) cause to be transported in, or in a
manner affecting, interstate commerce,
a digital media device unless the device
includes and utilizes standard security
technologies that adhere to the security
system standards.
**********
Of course, Lisa did not necessarily
intend to read his books. She might
want the computer only to write her
midterm. But Dan knew she came from
a middle-class family and could hardly
afford the tuition, let alone her reading
fees. Books might be the only way she
could graduate
**********
Research in analysis (i.e., the evaluation
of the strengths and weaknesses of
computer system) is essential to the
development of effective security, both
for works protected by copyright law
and for information in general. Such
research can progress only through the
open publication and exchange of
complete scientific results
**********
I am very very very happy!
What about you?
**********
6
computer
books AND computer
books OR protected
NOT security
very
slick
Sample Output

want the computer only to write her
----------
computer system) is essential to the
==========
intend to read his books. She might
want the computer only to write her
fees. Books might be the only way she
==========
intend to read his books. She might
fees. Books might be the only way she
----------
for works protected by copyright law
==========
Of course, Lisa did not necessarily
intend to read his books. She might
want the computer only to write her
midterm. But Dan knew she came from
a middle-class family and could hardly
afford the tuition, let alone her reading
fees. Books might be the only way she
could graduate
----------
I am very very very happy!
What about you?
==========
I am very very very happy!
==========
Sorry, I found nothing.
==========

题目类型：简单字符处理+二分搜索

算法分析：根据题目的提示使用map<string, vector<node>>dict[maxn]来模拟倒插表，注意数据集中的单词为忽略大小写的

/**************************************************
filename       :j.cpp
author         :maksyuki
created time   :2018/1/24 18:08:06
last modified  :2018/1/24 21:50:52
file location  :C:\Users\abcd\Desktop\TheEternalPoet
***************************************************/

#pragma comment(linker, "/STACK:102400000,102400000")
#include <set>
#include <bitset>
#include <list>
#include <map>
#include <stack>
#include <queue>
#include <deque>
#include <string>
#include <vector>
#include <ios>
#include <iostream>
#include <fstream>
#include <sstream>
#include <iomanip>
#include <algorithm>
#include <utility>
#include <complex>
#include <numeric>
#include <functional>
#include <cmath>
#include <ctime>
#include <climits>
#include <cstdarg>
#include <cstdio>
#include <cstdlib>
#include <cstring>
#include <cctype>
#include <cassert>
using namespace std;

#define CFF freopen ("in", "r", stdin)
#define CFO freopen ("out", "w", stdout)
#define CPPFF ifstream cin ("in")
#define CPPFO ofstream cout ("out")
#define	DB(ccc)	cout << #ccc << " = " << ccc << endl
#define	DBT printf("time used: %.2lfs\n", (double) clock() / CLOCKS_PER_SEC)
#define PB push_back
#define MP(A, B) make_pair(A, B)

typedef long long LL;
typedef unsigned long long ULL;
typedef double DB;
typedef pair <int, int> PII;
typedef pair <int, bool> PIB;

const int INF = 0x7F7F7F7F;
const int MOD = 1e9 + 7;
const double EPS = 1e-10;
const double PI = 2 * acos (0.0);
const int maxn = 1e2 + 66;

struct node {
    int va, vb;
    node () {}
    node (int aa, int bb): va(aa), vb(bb) {}

    bool operator < (const node &a) const {
        if(va != a.va) return va < a.va;
        return vb < a.vb;
    }
    bool operator == (const node &a) const {
        if(va == a.va && vb == a.vb) return true;
        return false;
    }
};

map<string, vector<node>> dict;
vector<string> word[maxn];
vector<node> ans;
bool is_find;

void output() {
    int aalen = ans.size();
    if(!aalen) is_find = false;
    else {
        sort(ans.begin(), ans.end());
        //DB(aalen);
        aalen = unique(ans.begin(), ans.end()) - ans.begin();
        //DB(aalen);

        bool is_first = true;
        int vvv = -1;
        for(int i = 0; i < aalen; i++) {
            if(is_first) {
                is_first = false;
                vvv = ans[i].va;
            }
            else {
                if(vvv != ans[i].va) {
                    vvv = ans[i].va;
                    cout << "----------" << endl;
                }
            }
            cout << word[ans[i].va][ans[i].vb-1] << endl;
            //DB(i);
        }
    }
}

int main()
{
#ifdef LOCAL
    CFF;
    CFO;
#endif
    
    int n;
    cin >> n;

    string s;
    stringstream ss;
    getline(cin, s);
    for(int i = 1; i <= n; i++) {
        int row = 0;
        while(getline(cin, s)) {
            if(s == "**********") break;
            
            word[i].emplace_back(s);
            row++;
            for(int j = 0; s[j]; j++) {
                if(!isalpha(s[j])) s[j] = ' ';
                else s[j] = tolower(s[j]); 
            }
            
            ss.clear();
            ss << s;
            while(ss >> s) dict[s].emplace_back(node(i, row));
        }
    }
    
    int m;
    cin >> m;
    getline(cin, s);
    for(int i = 1; i <= m; i++) {
        getline(cin, s);
        //DB(s);
        is_find = true;
        if(s.find("OR") != string::npos) {
            ss.clear();
            ss << s;
            string sa, sb;
            ss >> sa >> s >> sb;
            
            for(int j = 0; sa[j]; j++) sa[j] = tolower(sa[j]);
            for(int j = 0; sb[j]; j++) sb[j] = tolower(sb[j]);

            ans.clear();
            int alen = dict[sa].size(), blen = dict[sb].size();
            for(int j = 0; j < alen; j++) ans.emplace_back(dict[sa][j]);
            for(int j = 0; j < blen; j++) ans.emplace_back(dict[sb][j]);
            
            output();
        }
        else if(s.find("AND") != string::npos) {
            ss.clear();
            ss << s;
            string sa, sb;
            ss >> sa >> s >> sb;

            for(int j = 0; sa[j]; j++) sa[j] = tolower(sa[j]);
            for(int j = 0; sb[j]; j++) sb[j] = tolower(sb[j]);
            ans.clear();
            int alen = dict[sa].size(), blen = dict[sb].size();
            for(int j = 0; j < alen; j++) {
                for(int k = 0; k < blen; k++)
                    if(dict[sa][j].va == dict[sb][k].va) {
                        ans.emplace_back(dict[sa][j]);
                        ans.emplace_back(dict[sb][k]);
                    }
            }
            output(); 
            //DB("hello");
        }
        else if(s.find("NOT") != string::npos) {
            ss.clear();
            ss << s;
            string sa;
            ss >> s >> sa;
            
            for(int j = 0; sa[j]; j++) sa[j] = tolower(sa[j]);

            int cnt[maxn];
            for(int j = 0; j < maxn; j++) cnt[j] = 1;

            int alen = dict[sa].size();
            for(int j = 0; j < alen; j++)
                cnt[dict[sa][j].va] = 0;

            bool is_output = false, is_first = true;
            for(int j = 1; j <= n; j++) {
                if(cnt[j]) {
                    is_output = true;
                    if(is_first) is_first = false;
                    else cout << "----------" << endl;

                    int wlen = word[j].size();
                    for(int k = 0; k < wlen; k++)
                        cout << word[j][k] << endl;
                }
            }
            if(!is_output) is_find = false;
        }
        else {
            for(int j = 0; s[j]; j++) s[j] = tolower(s[j]);

            int len = dict[s].size();
            if(!len) is_find = false;
            else {
                bool is_first = true;
                int vvv = -1;

                ans.clear();
                for(int j = 0; j < len; j++) ans.emplace_back(dict[s][j]);
                int alen = unique(ans.begin(), ans.end()) - ans.begin();

                for(int j = 0; j < alen; j++) {
                    if(is_first) {
                        is_first = false;
                        vvv = ans[j].va;
                    }
                    else {
                        if(vvv != ans[j].va) {
                            vvv = ans[j].va;
                            cout << "----------" << endl;
                        }
                    }
                    cout << word[ans[j].va][ans[j].vb-1] << endl;
                }
            }
        }
        if(!is_find) cout << "Sorry, I found nothing." << endl;
        cout << "==========" << endl; 
    }
    return 0;
}

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

/**************************************************

filename :j.cpp

author :maksyuki

created time :2018/1/24 18:08:06

last modified :2018/1/24 21:50:52

file location :C:\Users\abcd\Desktop\TheEternalPoet

***************************************************/

#pragma comment(linker, "/STACK:102400000,102400000")

#include <set>

#include <bitset>

#include <list>

#include <map>

#include <stack>

#include <queue>

#include <deque>

#include <string>

#include <vector>

#include <ios>

#include <iostream>

#include <fstream>

#include <sstream>

#include <iomanip>

#include <algorithm>

#include <utility>

#include <complex>

#include <numeric>

#include <functional>

#include <cmath>

#include <ctime>

#include <climits>

#include <cstdarg>

#include <cstdio>

#include <cstdlib>

#include <cstring>

#include <cctype>

#include <cassert>

using namespace std;

#define CFF freopen ("in", "r", stdin)

#define CFO freopen ("out", "w", stdout)

#define CPPFF ifstream cin ("in")

#define CPPFO ofstream cout ("out")

#define DB(ccc) cout << #ccc << " = " << ccc << endl

#define DBT printf("time used: %.2lfs\n", (double) clock() / CLOCKS_PER_SEC)

#define PB push_back

#define MP(A, B) make_pair(A, B)

typedef long long LL;

typedef unsigned long long ULL;

typedef double DB;

typedef pair <int, int> PII;

typedef pair <int, bool> PIB;

const int INF = 0x7F7F7F7F;

const int MOD = 1e9 + 7;

const double EPS = 1e-10;

const double PI = 2 * acos (0.0);

const int maxn = 1e2 + 66;

struct node {

int va, vb;

node () {}

node (int aa, int bb): va(aa), vb(bb) {}

bool operator < (const node &a) const {

if(va != a.va) return va < a.va;

return vb < a.vb;

}

bool operator == (const node &a) const {

if(va == a.va && vb == a.vb) return true;

return false;

}

};

map<string, vector<node>> dict;

vector<string> word[maxn];

vector<node> ans;

bool is_find;

void output() {

int aalen = ans.size();

if(!aalen) is_find = false;

else {

sort(ans.begin(), ans.end());

//DB(aalen);

aalen = unique(ans.begin(), ans.end()) - ans.begin();

//DB(aalen);

bool is_first = true;

int vvv = -1;

for(int i = 0; i < aalen; i++) {

if(is_first) {

is_first = false;

vvv = ans[i].va;

}

else {

if(vvv != ans[i].va) {

vvv = ans[i].va;

cout << "----------" << endl;

}

cout << word[ans[i].va][ans[i].vb-1] << endl;

//DB(i);

}

int main()

{

#ifdef LOCAL

CFF;

CFO;

#endif

int n;

cin >> n;

string s;

stringstream ss;

getline(cin, s);

for(int i = 1; i <= n; i++) {

int row = 0;

while(getline(cin, s)) {

if(s == "**********") break;

word[i].emplace_back(s);

row++;

for(int j = 0; s[j]; j++) {

if(!isalpha(s[j])) s[j] = ' ';

else s[j] = tolower(s[j]);

}

ss.clear();

ss << s;

while(ss >> s) dict[s].emplace_back(node(i, row));

}

int m;

cin >> m;

getline(cin, s);

for(int i = 1; i <= m; i++) {

getline(cin, s);

//DB(s);

is_find = true;

if(s.find("OR") != string::npos) {

ss.clear();

ss << s;

string sa, sb;

ss >> sa >> s >> sb;

for(int j = 0; sa[j]; j++) sa[j] = tolower(sa[j]);

for(int j = 0; sb[j]; j++) sb[j] = tolower(sb[j]);

ans.clear();

int alen = dict[sa].size(), blen = dict[sb].size();

for(int j = 0; j < alen; j++) ans.emplace_back(dict[sa][j]);

for(int j = 0; j < blen; j++) ans.emplace_back(dict[sb][j]);

output();

}

else if(s.find("AND") != string::npos) {

ss.clear();

ss << s;

string sa, sb;

ss >> sa >> s >> sb;

for(int j = 0; sa[j]; j++) sa[j] = tolower(sa[j]);

for(int j = 0; sb[j]; j++) sb[j] = tolower(sb[j]);

ans.clear();

int alen = dict[sa].size(), blen = dict[sb].size();

for(int j = 0; j < alen; j++) {

for(int k = 0; k < blen; k++)

if(dict[sa][j].va == dict[sb][k].va) {

ans.emplace_back(dict[sa][j]);

ans.emplace_back(dict[sb][k]);

}

output();

//DB("hello");

}

else if(s.find("NOT") != string::npos) {

ss.clear();

ss << s;

string sa;

ss >> s >> sa;

for(int j = 0; sa[j]; j++) sa[j] = tolower(sa[j]);

int cnt[maxn];

for(int j = 0; j < maxn; j++) cnt[j] = 1;

int alen = dict[sa].size();

for(int j = 0; j < alen; j++)

cnt[dict[sa][j].va] = 0;

bool is_output = false, is_first = true;

for(int j = 1; j <= n; j++) {

if(cnt[j]) {

is_output = true;

if(is_first) is_first = false;

else cout << "----------" << endl;

int wlen = word[j].size();

for(int k = 0; k < wlen; k++)

cout << word[j][k] << endl;

}

if(!is_output) is_find = false;

}

else {

for(int j = 0; s[j]; j++) s[j] = tolower(s[j]);

int len = dict[s].size();

if(!len) is_find = false;

else {

bool is_first = true;

int vvv = -1;

ans.clear();

for(int j = 0; j < len; j++) ans.emplace_back(dict[s][j]);

int alen = unique(ans.begin(), ans.end()) - ans.begin();

for(int j = 0; j < alen; j++) {

if(is_first) {

is_first = false;

vvv = ans[j].va;

}

else {

if(vvv != ans[j].va) {

vvv = ans[j].va;

cout << "----------" << endl;

}

cout << word[ans[j].va][ans[j].vb-1] << endl;

}

if(!is_find) cout << "Sorry, I found nothing." << endl;

cout << "==========" << endl;

}

return 0;

}

随机文章

Codeforces Round #308(Div.2) (5/5)

poj3723

HihoCoder挑战赛14(2/4)

poj3259

BestCoder Round #65 (4/5)

uva1368

uva1596

uva12504

超客的水墨烟雨

uva1597

随机文章

分类目录

标签云

书签

近期评论

2026年6月
一	二	三	四	五	六	日
« 6月
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30