Twitter: Useful Tool for Machine Learning (ML)
Data is literally everywhere. This may seem as though this is solely a benefit, however at
times there is too much as it abounds. The vast amount, when attempted to analyze, may make it
difficult to understand what is really there and how it may be useful. Whether researching
InfoSec or the latest system upgrades, there should be methods and tools present to alleviate the
One potential source of this data is Twitter. People and businesses tweet on nearly
everything. This may be food, dinner, present mood, politics, or any other number of items. One
useful area that has reviewed this aspect of Twitter has been ML. This is a great source for data
mining with virtually any subject. This is also a free source for people to express their opinions
or thoughts. This lack of barrier for entry has allowed everyone to input their thoughts, whereas
other venues have not done this. At times, there may be results slightly skewed by the trolls. In
light of the overall number of entries, the level of skew due to this would not be significant and
could be primarily removed with a script.
One such application recently occurred with a study on opioid abuse. Tim Mackey,
Janani Kalyanam, and Takeo Katsuki in the American Journal of Public Health published their
research on detecting prescription opioid abuse promotion and access using Twitter
(http://alphapublications.org/doi/pdfplus/10.2105/AJPH.2017.303994). The researchers’
methodology included collecting tweets from Twitter. These were only the publicly accessible
items within Twitter. Their search filter was for terms associated with opioid prescriptions. The
researchers used unsupervised machine learning and applying topic modeling.
The sample analyzed was 619,937 tweets with the term codeine, percocet, fentanyl,
vicodin, oxycontin, oxycodone, and hydrocone. The sample period was from June to November
2015. From these 1,778 tweets or less than 1% were noted in marketing the sale of controlled
substances online. Of these, 90% had embedded links.
While no methodology for research is perfect, this falls within the realm of acceptable
protocols. ML has taken this and increased its potential exponentially. The continued ML use
and application will further research on not only the lease level but also the understanding and
comprehension of the data itself, along with its implications. This was only one example of the
many where ML would be exceptional in its application. As applied to InfoSec, this could also
be used to research compromises, data lost, or other subjects.
About the Author - Charles Parker, II has been working in the info sec field for over a decade, performing pen tests, vulnerability assessments, consulting with small- to medium-sized businesses to mitigate and remediate their issues, and preparing IT and info sec policies and procedures. Mr. Parker’s background includes work in the banking, medical, automotive, and staffing industries.