Random Material Name Generator

I extracted the Archive of Title Names (87MiB) to a 330MiB large Text File, and then ran that through the following:

cat input | awk -F  "_" '!$3' | iconv -f utf8 -t ascii//TRANSLIT | tr -cd '[:alpha:]\n' | tr '[:upper:]' '[:lower:]' | tr -s 'a-z' | awk 'length($0)>3' | sort | uniq -u >> output 

This shortened the File down from the original 16 Million to just 5.6 Million Lines and 72MiB.

Here is what each Pipe Segment is doing:

  1. open the File
  2. remove all lines that have 3 or more words in them. The Underscore is what Wikipedia uses instead of Spaces because URLs.
  3. Screw all this UTF-8 nonsense, I want easy to pronounce ASCII and transliterations!
  4. Just kill all the other Stuff, I want only the Alphabet and \n in this File.
  5. Make it all lowercase now!
  6. Kill all repetitive Characters like double-L or so.
  7. After all this is trimmed down, everything 3 or shorter should get cut out entirely.
  8. Sort the whole thing.
  9. Deduplicate Entries.
  10. And append to the Output File!
1 Like