Twitter: Useful Tool for Machine Learning (ML)
Data is literally everywhere. This may seem as though this is solely a benefit, however at times there is too much as it abounds. The vast amount, when attempted to analyze, may make it difficult to understand what is really there and how it may be useful. Whether researching InfoSec or the latest system upgrades, there should be methods and tools present to alleviate the issue.
One potential source of this data is Twitter. People and businesses tweet on nearly everything. This may be food, dinner, present mood, politics, or any other number of items. One useful area that has reviewed this aspect of Twitter has been ML. This is a great source for data mining with virtually any subject. This is also a free source for people to express their opinions or thoughts. This lack of barrier for entry has allowed everyone to input their thoughts, whereas other venues have not done this. At times, there may be results slightly skewed by the trolls. In light of the overall number of entries, the level of skew due to this would not be significant and could be primarily removed with a script.
One such application recently occurred with a study on opioid abuse. Tim Mackey, Janani Kalyanam, and Takeo Katsuki in the American Journal of Public Health published their research on detecting prescription opioid abuse promotion and access using Twitter (http://alphapublications.org/doi/pdfplus/10.2105/AJPH.2017.303994). The researchers’ methodology included collecting tweets from Twitter. These were only the publicly accessible items within Twitter. Their search filter was for terms associated with opioid prescriptions. The researchers used unsupervised machine learning and applying topic modeling.
The sample analyzed was 619,937 tweets with the term codeine, percocet, fentanyl, vicodin, oxycontin, oxycodone, and hydrocone. The sample period was from June to November 2015. From these 1,778 tweets or less than 1% were noted in marketing the sale of controlled substances online. Of these, 90% had embedded links.
While no methodology for research is perfect, this falls within the realm of acceptable protocols. ML has taken this and increased its potential exponentially. The continued ML use and application will further research on not only the lease level but also the understanding and comprehension of the data itself, along with its implications. This was only one example of the many where ML would be exceptional in its application. As applied to InfoSec, this could also be used to research compromises, data lost, or other subjects.
About the Author - Charles Parker, II has been working in the info sec field for over a decade, performing pen tests, vulnerability assessments, consulting with small- to medium-sized businesses to mitigate and remediate their issues, and preparing IT and info sec policies and procedures. Mr. Parker’s background includes work in the banking, medical, automotive, and staffing industries.