All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",Return:["AAAAACCCCC", "CCCCCAAAAA"].十個(gè)字符表示一個(gè)DNA序列,找出多次出現(xiàn)的序列。移動(dòng)頭尾兩個(gè)指針,用map存儲(chǔ)已經(jīng)探測(cè)過的序列,當(dāng)該序列之前出現(xiàn)過一次的時(shí)候,則將它加入答案中。
在由前一個(gè)位置的序列得到下一個(gè)位置的序列時(shí)可以采用位運(yùn)算的方式。
對(duì)AGCT四個(gè)字母編碼,0,1,2,3,則兩個(gè)bit可以表示一個(gè)字母,十個(gè)字符正好20bit,可以用int類型表示。
移動(dòng)到下一位時(shí)用上一個(gè)位置的int值向高位移動(dòng)2bit,然后或上新的2bit,在與0xfffff求與保留低20位。
class Solution {public: vector<string> findRepeatedDnaSequences(string s) { vector<string> res; if(s.size() < 10) return res; unordered_map<int, int> mp; unordered_map<char, int> id; id['A'] = 0; id['C'] = 1; id['G'] = 2; id['T'] = 3; int temp = 0; for(int i = 0; i < s.size(); i++) { temp = (temp<<2| (id[s[i]] & 3)) & 0xfffff; if(i > 8) { if(mp[temp] == 1) res.push_back(s.substr(i - 9,10)); mp[temp]++; } } return res; }};
新聞熱點(diǎn)
疑難解答
圖片精選
網(wǎng)友關(guān)注