Designing Classes and Programs

Designing Classes and Programs

Searching, Maps, Tables (hashing) Searching is a fundamentally important operation We want to search quickly, very very quickly Consider searching using google.com, ACES, issues? In general we want to search in a collection for a key Recall search in readsettree.cpp, readsetlist2.cpp Tree implementation was quick Vector of linked lists was fast, but how to make it faster? If we compare keys, we cannot do better than log n to search n elements Lower bound is (log n), provablelog n), provable Hashing is O(log n), provable1) on average, not a contradiction, why? CPS 100 7.1 From Google to Maps If we wanted to write a search engine wed need to access lots of pages and keep lots of data

Given a word, on what pages does it appear? This is a map of words->web pages In general a map associates a key with a value Look up the key in the map, get the value Google: key is word/words, value is list of web pages Anagram: key is string, value is words that are anagrams Interface issues Lookup a key, return boolean: in map or value: associated with the key (log n), provablewhat if key not in map?) Insert a key/value pair into the map CPS 100 7.2 Interface at work: tmapcounter.cpp Key is a string, Value is # occurrences Interface in code below shows how tmap class works while (input >> word) { if (map->contains(word)) { map->get(word) += 1; }

else { map->insert(word,1); } } What clues are there for prototype of map.get and map.contains? Reference is returned by get, not a copy, why? Parameters to contains, get, insert are same type, what? CPS 100 7.3 Accessing values in a map (e.g., print) We can apply a function object to every element in a map, this is called an internal iterator Simple to implement (log n), provablewhy?), relatively easy to use See Printer class in tmapcounter.cpp Limited: must visit every map element (log n), provablecant stop early)

Alternative: use Iterator subclass (log n), provablesee tmapcounter.cpp), this is called an external iterator Iterator has access to guts of a map, iterates over it Must be a friend-class to access guts Tightly coupled: container and iterator CPS 100 Standard interface of Init, HasMore, Next, Current Can have several iterators at once, can stop early, can pass iterators around as parameters/objects 7.4 Internal iterator (applyAll/applyOne) Applicant subclass: applied to key/value pairs stored in a map The applicant has an applyOne function, called from the map/collection, in turn, with each key/value pair The map/collection has an applyAll function to which is passed an instance of a subclass of Applicant class Printer : public Applicant { public: virtual void applyOne(string& key, int& value) { cout << value << "\t" << key << endl;

} }; Applicant class is templated on the type of key and value See tmap.h, tmapcounter.cpp, and other examples CPS 100 7.5 From interface to implementation First the name: STL uses map, Java uses map, well use map Other books/courses use table, dictionary, symbol table Weve seen part of the map interface in tmapcounter.cpp What other functions might be useful? Whats actually stored internally in a map? The class tmap is a templated, abstract base class Advantage of templated class (log n), provablee.g., tvector, tstack, tqueue) Base class permits different implementations UVmap, BSTVap, HMap (log n), provablestores just string->value) Internally combine key/value into a pair is part of STL, standard template library

Struct with two fields: first and second CPS 100 7.6 External Iterator The Iterator base class is templated on pair, makes for ugly declaration of iterator pointer (log n), provablenote: space between > > in code below is required why?) Iterator > * it = map->makeIterator(); for(it->Init(); it->HasMore(); it->Next()) { cout << it->Current().second << \t; cout << it->Current().first << endl; } We ask a map/container to provide us with an iterator We don't know how the map is implemented, just want an iterator Map object is an iterator factory: makes/creates iterator CPS 100 7.7

Tapestry tmap v STL map See comparable code in tmapcounterstl.cpp Instead of get, use overloaded [] operator Instead of contains use count --- returns an int Instead of Iterator class with Init, HasMore, Use begin() and end() for starting and ending values Use ++ to increment iterator [compare with Next() ] Instead of Current(), dereference the iterator STL map uses a balanced search tree, guaranteed O(log n), provablelog n) Nonstandard hash_map is tricky to use in general Well see one way to do balanced trees later CPS 100 7.8 Map example: finding anagrams mapanagram.cpp, alternative program for finding anagrams Maps string (log n), provablenormalized): key to tvector:

value Look up normalized string, associate all "equal" strings with normalized form To print, loop over all keys, grab vector, print if ??? Each value in the map is list/collection of anagrams How do we look up this value? How do we create initial list to store (log n), provablefirst time) We actually store pointer to vector rather than vector Avoid map->get(log n), provable)[k], can't copy vector returned by get See also mapanastl.cpp for standard C++ using STL The STL code is very similar to tapestry (log n), provableand to Java!) CPS 100 7.9 Hashing: Log (10100) is a big number Comparison based searches are too slow for lots of data How many comparisons needed for a billion elements? What if one billion web-pages indexed?

Hashing is a search method that has average case O(log n), provable1) search Worst case is very bad, but in practice hashing is good Associate a number with every key, use the number to store the key Like catalog in library, given book title, find the book A hash function generates the number from the key Goal: Efficient to calculate Goal: Distributes keys evenly in hash table CPS 100 7.10 Hashing details 0 1 2 3

n-1 There will be collisions, two keys will hash to the same value We must handle collisions, still have efficient search What about birthday paradox: using birthday as hash function, will there be collisions in a room of 25 people? Several ways to handle collisions, in general array/vector used Linear probing, look in next spot if not found Hash to index h, try h+1, h+2, , wrap at end Clustering problems, deletion problems, growing problems Quadratic probing Hash to index h, try h+12, h+22 , h+32 , , wrap at end Fewer clustering problems Double hashing Hash to index h, with another hash function to j Try h, h+j, h+2j, CPS 100 7.11 Chaining with hashing

With n buckets each bucket stores linked list Compute hash value h, look up key in linked list table[h] Hopefully linked lists are short, searching is fast Unsuccessful searches often faster than successful Empty linked lists searched more quickly than non-empty Potential problems? Hash table details Size of hash table should be a prime number Keep load factor small: number of keys/size of table On average, with reasonable load factor, search is O(log n), provable1) What if load factor gets too high? Rehash or other method CPS 100 7.12 Hashing problems Linear probing, hash(log n), provablex) = x, (log n), provablemod tablesize) Insert 24, 12, 45, 14, delete 24, insert 23 (log n), provablewhere?) 12

0 2 3 14 4 5 6 7 8 9 10 Same numbers, use quadratic probing (log n), provableclustering better?) 0 1

24 45 12 24 14 1 2 3 45 4 5 6 7 8 9 10 What about chaining, what happens? CPS 100

7.13 What about hash functions Hashing often done on strings, consider two alternatives unsigned hash(const string& s) { unsigned int k, total = 0; for(k=0; k < s.length(); k++){ total += s[k]; } return total; } Consider total += (k+1)*s[k], why might this be better? Other functions used, always mod result by table size What about hashing other objects? Need conversion of key to index, not always simple HMap (log n), provablesubclass of tmap) maps string->values Why not any key type (log n), provableonly strings)? CPS 100

7.14 Why use inheritance? We want to program to an interface (log n), provablean abstraction, a concept) The interface may be concretely implemented in different ways, consider stream hierarchy void readStuff(istream& input){} // call function ifstream input("data.txt"); readStuff(input); readStuff(cin); What about new kinds of streams, ok to use? Open/closed principle of code development Code should be open to extension, closed to modification Why is this (log n), provableusually) a good idea? CPS 100 7.15 Nancy Leveson: Software Safety Founded the field

Mathematical and engineering aspects Air traffic control Microsoft word "C++ is not state-of-theart, it's only state-of-thepractice, which in recent years has been going backwards" Software and steam engines: once extremely dangerous? http://sunnyday.mit.edu/steam.pdf THERAC 25: Radiation machine that killed many people http://sunnyday.mit.edu/papers/therac.pdf CPS 100 7.16

Recently Viewed Presentations

  • Trade Sustainability Impact Assessment in Support of the

    Trade Sustainability Impact Assessment in Support of the

    Introduction. Association Agreement negotiations between the EU and Mercosur began in 1999. The 2009 EU-Mercosur Sustainability Impact Assessment is among the most cited relevant studies, assessing the impact of the agreement on the respective economies along with relevant elements such...
  • PIP2 Lattice Booster Injection and Interface Points

    PIP2 Lattice Booster Injection and Interface Points

    Drawing tree (critical) Drawing package (critical) Import-format model. Native-format model (e.g. in .zip format) ... (for example) every nut and bolt. 11/5/2018. Presenter | Presentation Title. Model Simplification. Current strategy is to employ an "installation model" approach.
  • Automatically Building Special Purpose Search Engines with ...

    Automatically Building Special Purpose Search Engines with ...

    Labels: Examples: PER Yayuk Basuki Innocent Butare ORG 3M KDP Cleveland LOC Cleveland Nirmal Hriday The Oval MISC Java Basque 1,000 Lakes Rally Americans Sandy Berger Ariel Sharon Abdel Rahman Alberto Fujimori Edmond Pope Chinese Al Gore Americans Colin Powell...
  • NEW Addition to our Retention and Success Initiative

    NEW Addition to our Retention and Success Initiative

    Students with (7) C-, D, F, W, NC, U grades at UCF will be placed in a probationary lack of progress status with the college. Students will meet with a CECS/COP advisor. Students with (10) C-,D, F, W, NC, U...
  • Reimbursement Issues - Msjiggarn

    Reimbursement Issues - Msjiggarn

    Intermittent self-catheterization promotes independent function for the patient. Skills for Urinary or Reproductive Tract Disorders Routine Catheter Care Perineal care and the cleansing of the first 2 inches of the catheter every 8 hours are expected at minimum. The use...
  • Stewart Dynasty - Ms. Jones&#x27;s AP Euro

    Stewart Dynasty - Ms. Jones's AP Euro

    James I: Not what we expected. Firm believer in Divine Right of Kings "kings are not only God's lieutenants upon earth, and sit on God's throne, but even by God himself they are called gods."
  • 2010 CIF Scorekeeping Clinic - North Bay Volleyball

    2010 CIF Scorekeeping Clinic - North Bay Volleyball

    CIF Scorekeeping Tutorial SCORING CLINIC PREFACE This is a tutorial created to assist referees and high school scorers with CIF scoring procedures. Text describing events on the court will appear first. A score sheet will show how to score these...
  • PORTADA PRESENTACIÓN UEM - Academia Cartagena99

    PORTADA PRESENTACIÓN UEM - Academia Cartagena99

    in the turbulent case, and is completely dominated by viscous effects. This inner layer is termed as the viscous sublayer; velocity varies linearly with distance from the wall. The so-called "outer region" or called also as inertial sublayer, shows nearly...