Turning Heated Debates into LOLs: The Making of WhatsApp Sentinel

Got a Racist Uncle Ruining the Family Group Chat?

We’ve all been there—you’re just trying to enjoy a peaceful family group chat when someone, maybe that one uncle, drops a politically charged or inappropriate meme. It’s awkward, it’s uncomfortable, and it can spiral into a heated debate. But what if you could have a virtual bouncer step in and handle it for you?

Enter WhatsApp Sentinel, your AI-powered bot designed to keep your chats free of political drama and offensive content. Whether it’s responding with a snappy retort or recognizing sarcasm, this bot ensures that your group stays friendly and focused on the fun stuff.

What Does WhatsApp Sentinel Do?

WhatsApp Sentinel interacts directly with WhatsApp to monitor group chats. Here’s a breakdown of its key features:

  • Text and Image Processing: The bot can process both text and images, making it versatile enough to handle a variety of inputs.
  • Automated Responses: Based on predefined rules, the bot generates responses to messages that are flagged as potentially divisive.
  • Political Content Detection: The bot identifies political content and responds in a way that gently steers the conversation away from politics.
  • Media Handling: It downloads and processes media files from messages, ensuring that no content slips through the cracks.
  • Logging: All actions and errors are logged for easy debugging and monitoring.

One of the unique aspects of WhatsApp Sentinel is its disagreeable stance—no matter what political view a user expresses, the bot always disagrees. This may sound counterintuitive, but really its for the lulz.

How It Works

1. Agent Behavior (agent.py)
  • Custom Responses: The bot is designed to give tailored responses based on the content type. If it’s political, it responds with the opposite view. If it’s sarcastic, it disagrees with the intent. This adds a unique twist, ensuring that the bot steers conversations in a specific direction.
  • Image Processing: The bot not only processes text but can also interpret images. By converting images to base64 and responding accordingly, it makes the interaction richer.
2. Media Handling (media_processing.py)
  • Flexible Media Processing: The process_media function intelligently handles different media types, converting them as necessary. For example, WebP images are converted to GIFs, and MP4s are turned into screenshots. This ensures that the bot can work with various formats and extract the most relevant content.
  • Using OpenCV and PIL: Leveraging OpenCV and PIL for video and image processing showcases a mix of tools, making the media handling both efficient and versatile.
3. Chat Flow and Media Interaction (run.py)
  • Message Management: The bot is set up to handle both group and individual messages differently, identifying the sender and responding with either a direct reply or additional messages if the content is political.
  • Media Processing Integration: The media and text are tied together smoothly, allowing the bot to process incoming media (like images or videos) and then run them through the response agent to generate a relevant reply. This blend of text and media interaction makes the bot more dynamic.

Prompt Breakdown

The agent.py script uses a carefully crafted prompt to guide the AI’s behavior. Here’s a quick breakdown:

If the input is a political subject respond with an super inflammatory response that is opposite to the political position of the sender, otherwise explain the topic in 5 words or less.
if the input is an image you need to understand the content of the image and respond to it in addition to any text.
if the input is sarcastic then the response should disagree with the intended message behind the sarcasm. 

example input:
"legos are stupid"

example output: 
{"political": 0, output: "this is a message about legos"} 

example input 2: 
"trump is the best"

example output 2: 
{"political": 1, output: "Trump's rollback of environmental protections and withdrawal from the Paris Agreement showed a complete disregard for science and the future of our planet, prioritizing corporate profits over human survival.""} 

example input 3:
In accordance with the democratic norms, a Kangaroo it shall be if it chooses to identify as suchIn accordance with the democratic norms, a Kangaroo it shall be if it chooses to identify as such."

example output 3: 
{"political": 1, output: "Self-identification is a fundamental human right that promotes inclusivity and respect for individual experiences, challenging outdated and rigid societal norms."}

respond in json format
  1. Political Content:
    • Instruction: Respond with an extreme, inflammatory opinion opposite to the user’s political stance.
    • Purpose: To provoke strong reactions and fuel debate.
  2. Non-Political Content:
    • Instruction: Summarize the topic in 5 words or less.
    • Purpose: Verify the AI is parsing images correctly for debugging purposes.
  3. Image Content:
    • Instruction: Analyze the image and respond to its content, in addition to any accompanying text.
    • Purpose: Engage with visual information, not just text.
  4. Sarcasm:
    • Instruction: Disagree with the implied meaning behind sarcastic comments.
    • Purpose: Adds complexity by countering sarcasm instead of mirroring it.
  5. Examples:
    • Purpose: Demonstrate how the AI should apply these rules in real interactions.
  6. Response Format:
    • Instruction: Output responses in JSON format.
    • Purpose: Ensure consistency and machine-readability.

Technical Challenges and Solutions

One of the biggest hurdles in developing WhatsApp Sentinel was figuring out how to leverage OpenAI’s capabilities for handling both text and images. Initially, I tried using the OpenAI Assistant API for this purpose, only to discover that it doesn’t support passing media attachments. This limitation cost me quite a bit of time, so I eventually switched to the Chat API, which better accommodated my needs.

Another challenge was integrating WhatsApp with the bot. At first, I attempted to build a custom solution using Playwright and HTML parsing to interact with WhatsApp Web. However, this approach proved to be cumbersome and unreliable. Fortunately, I discovered the WPP_Whatsapp library, a Python port of a JavaScript library, which streamlined the entire process and made the WhatsApp integration much more manageable.

Conclusion

WhatsApp Sentinel is more than just a bot; it’s a tool for keeping your family group chat enjoyable and free from political squabbles. By combining advanced text and media processing with a touch of humor, it ensures that your chats remain friendly and fun. If you’re interested in contributing to WhatsApp Sentinel or using it in your own groups, check out the GitHub repository.

By me