stats201-Shiran-PS2
Shiran Yuan’s STATS 201 Repository
This is Shiran Yuan’s repository for the course STATS 201.
Nota Bene: This page contains figures generated by matplotlib.pyplot
, with transparent background. If you are viewing this on the rising-stars-by-sunshine.github.io
website, it is suggested that you use light mode instead of dark mode for best viewing experience.
Table of Contents
Bio
Shiran Yuan was born on September 2007. He dropped out of primary school during second grade in 2016 (age 8) after receiving a scholarship, and began self-studying since then.
In 2017 (age 9) he received IELTS band 7.
In 2018 (age 10), he completed the Shing-Tung Yau Mathcamp course at Tsinghua University, received Honor Roll of Distinction in the Second Round of the Mathematics League Competition (Grades 9-12 Track), earned High Honors in the Johns Hopkins Center for Talented Youth’s Talent Search Program, received a TOEFL score of 109, and was awarded the AP Scholar with Honor Award from the CollegeBoard.
In 2019 (age 11), he received the AP Scholar with Distinction Award from the CollegeBoard and a DELF B2 (French language) certificate.
In 2020 (age 12), he completed the Hsue-Shen Tsien Excellence in Engineering Program Summer Camp course at Tsinghua University.
In 2021 (age 13), he was admitted to DKU with a full scholarship, and received the top prize at X-Institute (an academic research institution founded by Tsinghua University) Summer Camp.
In 2022 (age 14), he became an X-Institute Scholar, and received the Silver Medal in the iGEM Competition.
His current intended major is Applied Mathematics and Computational Sciences (Computer Science Track).
Project Information
In this problem set, I am going to explore tweets about data privacy/security in blockchain over the past 10 years (Jan 1st 2013 to Dec 31st 2022). The following are the research questions:
- Part I (Explanation): What are the main keywords and topics of tweets about data privacy/security in blockchain over the past 10 years?
- Part II (Prediction): Is data privacy/security in blockchain projected to become increasingly or decreasingly popular as a topic on Twitter in 2023?
The project contains a total number of 2 data files, 5 code files, and 6 spotlight figures. All data and code were produced independently by the author.
Data
All data files are stored within the data
directory.
Explanation_Data.txt
: Contains the raw content of all queried tweets.PS2_Data.txt
: Contains the number of queried tweets for each day from Jan 1st 2013 to Dec 31st 2022. Days without queried tweets are not shown.
Code
All code files are stored within the code
directory.
PS2_Explanation_Query.ipynb
: Queries and processes the raw content of all queried tweets. Produces the data fileExplanation_Data.txt
.PS2_Explanation_Analyze.ipynb
: AnalyzesExplanation_Data.txt
. Produces the spotlight figurespotlight6.png
.PS2_Query_Data.ipynb
: Queries tweets and records their dates. Produces the data filePS2_Data.txt
.PS2_Process_Data.ipynb
: Processes and visualizesPS2_Data.txt
. Produces the spotlight figuresspotlight1.png
andspotlight2.png
.PS2_Analyze_Data.ipynb
: Conducts predictions based onPS2_Data.txt
using a two-hidden-layer neural network with 100 nodes per layer and trained for 100000 iterations. Produces the spotlight figuresspotlight4.png
andspotlight5.png
.
Spotlights
spotlight1.png
: The number of queried tweets by months.
spotlight2.png
: The number of queried tweets by days, and the smoothed version of this data smoothed by weekly averages.
spotlight3.png
: An intuitive display of the relationship of the queried data with the DAO hack incident and its subsequent SEC ruling.
spotlight4.png
: Prediction of future trends of the number of queried tweets by months from 2013-2025. (2021-2022 were held out for testing and 2023-2025 are predictions for the future)
spotlight5.png
: A residual plot version ofspotlight4.png
. Since the residual plot shows no appearantly observable pattern, the regression was successful.
spotlight6.png
: A word cloud of the keywords of the queried tweets.
Conclusions
The following conclusions are reached from this project.
- The main keywords of tweets about data privacy/security in blockchain over the past 10 years include: new technologies (e.g. artificial intelligence, cryptocurrencies, big data), major blockchain platforms (e.g. Bitcoin, Ethereum), major security threats (e.g. ransomware, data breach, malware), and related fields of applications (e.g. healthcare, biometrics, cryptography).
- The popularity of data privacy/security in blockchain as a Twitter topic is projected to recover from its previous decrease and slowly increase in the near future.
- The SEC ruling of the DAO hack incident seems to have had more impact on the popularity of the topic of data privacy/security in blockchain than the incident itself. (An unexpected discovery not corresponding to predetermined research questions)