collin.town
projects.

/

contact.
NTP Crowd Sourcing

NTP Crowd Sourcing

Why is this project required?

Our first year design class was approached by NTP with a question:

Can you design an application to crowd source tornado sightings around Canada?

My team developed the following need statement that we would need to meet for our project to be useful: NTP needs a way to compile relevant social media posts efficiently to collect more data points for storms.

Final Design Documentation

Since I was the one with technical experience, I was tasked with programming the implementation of our design.

Dashboard

ItemPriceDescription/Usage
Mapbox API KeyFree (50,000 map loads)Used for the front-end map
Heroku HostingFree (Student Plan)Hosting service that can easily integrate with existing version control. Used to host the back end for the application
Vercel HostingFreeHosting for Next.js applications. Used to host the front-end of the application
MongoDB DatabaseFree (depending on the hosting solution)We are using the MongoDB database for its NoSQL structure to allow complex objects that Twitter uses for their backend as well. We are hosting a database on a group member’s computer but without that, it would cost an estimated $7/month.
Auth0Free (not going to exceed usage)Auth0 provides and easy to use service for authentication, and is a service we leveraged to provide user accounts for NTP to manage access to the application

The backend of the website is designed with Python, within the Python code is a scraper for Twitter known as snscrape. This library has support for multiple social media platforms, but we chose to only use the Twitter package. We can target specific locations in our queries using this package which gives it an edge over other libraries such as Tweepy. But to schedule the usage of snscrape, we need to create multiple different threads for each query for asynchronous operation. We chose to use BackgroundScheduler which is an easy-to-use library that supports adding and removing threads dynamically during the runtime of the application. In these threads, the queries will perform a request to Twitter in intervals equivalent to the frequency of the query defined by the user. The resulting Tweets then need to be stored before being sent to the user. We are currently using MongoDB which has a native connection library built for Python. We use that to connect to the remote database and add the queries by performing an update operation. This will make sure there are no duplicate Tweets in the database that have the same ID. Once we have found the Tweets, we put them through the algorithm we have developed to determine the relevance of the Tweets to the selected keywords and the content interaction, basically finding the trending status of the Tweet. The resulting relatabilityScore is what we use to determine how the Tweet is displayed to NTP. The higher the score, the more relevant and important the Tweet is to NTP. When implemented in Python, the tweet is broken up into different metrics and analyzed then saved as a Tweet object. This code is seen below:

def solveAlgo(query, tweets):

    # Initialize empty list of tweets
    tweetList = []

    for tweet in tweets:
        # Calculate the total media attached to the post
        mediaCount = 0
        media = []
        if tweet['media'] is not None:
            for m in tweet['media']:
                if type(m) == sntwitter.Photo:
                    mediaCount += 1
                    media.append({
                        'type': 'photo',
                        'url': m.fullUrl
                    })
                elif type(m) == sntwitter.Video:
                    mediaCount += 1
                    for videoType in m.variants:
                        if videoType.contentType != 'application/x-mpegURL':
                            media.append({
                                'type': 'video',
                                'url': videoType.url,
                                'contentType': videoType.contentType
                            })

        likes = tweet['likes']
        retweets = tweet['retweets']
        replies = tweet['replies']

        # Calculate the interaction score of the tweet
        interactionScore = 0
        if (likes + retweets + replies) != 0:
            interactionScore = (likes**2 + retweets**2 + replies) / math.sqrt(likes**2 + retweets**2 + replies**2)

        # Calculate the keyword count of the tweet
        keywordCount = 0
        for k in query.keywords:
            keywordCount += tweet['content'].lower().count(k.replace('(', '').replace(')', '').lower())

        # Calculate the relatability score of the tweet
        relatabilityScore = ((mediaCount) + (interactionScore)) * keywordCount

        # Default to query location if tweet location is not available
        location = {
            'type': 'Point',
            'coordinates': [float(query.location.split(',')[0]), float(query.location.split(',')[1])]
        }
        if tweet['coordinates'] is not None:
            location = {
                'type': 'Point',
                'coordinates': [tweet['coordinates'].longitude, tweet['coordinates'].latitude]
            }

        # Create a new tweet object
        _tweet = Tweet(
            tweet['id'],
            query.id,
            likes,
            retweets,
            replies,
            tweet['date'],
            location,
            tweet['content'],
            media,
            keywordCount,
            interactionScore,
            relatabilityScore
        )

        tweetList.append(_tweet)

    return tweetList

This then moves on to the connection between the front-end and back-end. We use the HTTP protocol to transfer the JSON data format between the data sources. To communicate, we chose the flask library to create an HTTP server for the Python backend. We set up multiple endpoints for a client to interact with and provide data to the database. Whenever the client needs to retrieve data, they will use a GET request and whenever new data needs to be sent to the back end, a POST request must be used.

Improvements

If we had more time, we would first increase the accuracy of our algorithm by using a machine learning model based on previous storms validated by NTP. Although our current method does work, it was developed using ~10 reported tornadoes by NTP. With the stored tweets we have found, we can build an AI model to detect better tweets by using image data along with sentiment analysis to detect the context somebody is tweeting as well so more unique and relevant content can be brought forward. As a team, we also brainstormed a partner app for training the algorithm where NTP can sift through tweets kind of like Tinder, where they would be given tweets that we have found, and they can determine whether they are good for detection. By doing this, NTP would be training a neural network that would be used for the previous improvement. We would also develop a mobile-friendly version of the application. With the current app, we were aiming for simplicity, but by adding another screen size to the application, the amount of code for the website layout and design would double as another grid layout would have to be created for smaller screen sizes that have a vertical screen. If we had more time this would have been possible to implement.

What I Learned

I learned the importance of multiple ideas in a team and also good communication within a team can lead to a better final product.

In the end, my team won the Client Choice Award which means that out of the 7 other teams, my team created the best solution to their problem. NTP is now using this project in their current tornado identification process. As tornadoes become more prevalent in Canada, I am very proud of my contribution to help predict tornadoes in the future and potentially save lives.

Source Code

The source code for the project and more detailed documentation can be found on my GitHub. The backend can be found here & the frontend can be found here.