Generating Fake Trump Tweets with LSTM

source
1188946
[212] 6
[212, 6] 92
[212, 6, 92] 26
[212, 6, 92, 26] 380
[212, 6, 92, 26, 380] 1079
[212, 6, 92, 26, 380, 1079] 25
array([[ 212,    0,    0, ...,    0,    0,    0],
[ 212, 6, 0, ..., 0, 0, 0],
[ 212, 6, 92, ..., 0, 0, 0],
...,
[4996, 403, 104, ..., 0, 0, 0],
[4996, 403, 104, ..., 0, 0, 0],
[4996, 403, 104, ..., 0, 0, 0]])
MemoryError: Unable to allocate 255. GiB for an array with shape (1188946, 57668) and data type float32
democrat senators are doing a great job . i am notdemocratic states , the democrats are not going to be a total disaster .republican senators have a great job for the great state of texas . he will be a great governor ! #maga #kag and , @senatorheitkamp. and , others , the people gop senators must stop the flights from the united states . obama ’s campaign is a total disaster . biden has been a total disaster . i will be back soon ! #maga #kag #tcot @foxbusiness oh well , i ’m not going to be a total mess .
republican senators are working hard to get the job done in the senate . we have a great state and , great healthcare ! we need strong borders and crime !obama is a disaster for the people . he is a disaster . he is a great guy . he is a winner . he is a winner . he is a winner . he is a winner . he is a great guy and a great guy . he will be missed !

bernie sanders is lying to the people of the united states . he is a total mess . he is a total mess . he is a total mess . he is a total mess ! he is a total mess ! he is a corrupt politician ! a total witch hunt ! no collusion , no obstruction . the dems don ’t want to do it . he is a corrupt politician ! he is a corrupt politician ! he is a corrupt politician ! he is strong on crime , borders , and , the enemy of the people !
democrats stole election results . they are a disgrace to our country , and , we will win ! gop senators are working hard on the border crisis . the dems are trying to take over the border . they are now trying to take away our laws . biden will bring back our country , and we are going to win the great state of texas . we need you in a second election .
  1. Garbage in — garbage out. 90% of success stems from good data. A more careful preprocessing can be done. For instance, you can try to remove hashtags since I found that predictions always go into a “vicious circle” of hashtags when the model doesn’t know what to predict. It will simply output a ton a unrelated hashtags, which obviously doesn’t have a lot of value. Another thing to try would be drop tweets with too low or too high length.
  2. Model architecture. I was hoping to achieve better results with a deeper NN with less units, but apparently shallower, wider NN worked better for me. You can experiment with the # of layers, # of units, and dropout rate.
  3. Replace Embedding layer with the actual word embeddings trained on your dataset, such as Glove or Word2vec.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Miroslav Tushev

Miroslav Tushev

CS PhD @ LSU. Passionate about statistics, ML, and NLP.