Back to Templates

Build a Weekly AI Trend Alerter with arXiv and Weaviate

Created by

Created by: Mary Newhauser || maryn

Mary Newhauser

Last update

Last update a day ago

Share


Build a Weekly AI Trend Alerter with arXiv and Weaviate

Ditch the endless scroll for AI trends. Meet Archi, your personal AI research assistant that hits you up once a week with everyone you need to know. 🧑🏽‍🔬

This workflow scrapes AI and machine learning article abstracts from arXiv, enriches them with topic categories using a LLM, and embeds them in a Weaviate vector store. The vector store is then used as a tool for agentic RAG to write a concise, easy-to-read summary of the week in AI research.

The final output is a short, weekly email sent to the address of your choice that summarizes key AI research trends and future research directions, with links directly to the most interesting and impactful arXiv papers of the week.

Who it's for

This workflow is for anyone who can't keep up with all the latest AI advances. Coding skills are not required.

How it works

This is a contiguous workflow that can be summarized in two main parts: a data pipeline that fetches and embeds articles in Weaviate, and an agentic workflow that generates a weekly email summary.

Part 1: Automatically fetch newly published articles on a weekly basis

  • Fetch article abstracts (and metadata) from arXiv's free API
  • Pre-process abstract data
  • Enrich each article with a primary topic, secondary topics, and estimated potential impact of the research using a LLM
  • Post-process data
  • Insert data and embeddings into Weaviate

Part 2: Use an AI Agent and Weaviate to generate a weekly summary email

  • Add Weaviate as a Tool to an AI agent node
  • Query Weaviate, agentically, to generate a report on the most important research trends of the week
  • Post-process data
  • Send the summary via email

Prerequisites

  1. An existing Weaviate cluster. You can view instructions for setting up a local cluster with Docker here or a Weaviate Cloud cluster here.
  2. API keys to generate embeddings and power chat models. We use a combination of OpenRouter and OpenAI models. Feel free to switch out the models as you like.
  3. An email address with STMP privileges. This is the address the email will come from. In this demo we use a personal Gmail address. You can create a new credential to link a STMP Account using these instructions.
  4. Self-hosted n8n instance. See this video for how to get set up in just three minutes.

How to run the workflow

  1. Go through the prerequisites, creating a Weaviate cluster (can be local or cloud), downloading self-hosted n8n, creating STMP privileges for your email account, and adding your API keys and other credentials.
  2. Select the embedding and chat models you'd like to use.
  3. Enter the email addresses you want to send the email from and to.
  4. Let it rip.

Workflow output

The output for this workflow is a weekly email that summarizes key research trends and future research directions based on AI and ML papers published on arXiv.

Here's an example of a summary email:

Hey there,

Here's a quick rundown of the key trends in Machine Learning research from the past week.

💫 Key Research Trends This Week

This week saw significant advancements in retrieval-augmented systems, foundation models for specialized domains, and techniques balancing efficiency with performance.

  • Advanced RAG Architectures: Researchers are developing sophisticated RAG frameworks that go beyond simple document retrieval, with AdaPCR introducing passage combination retrieval and UrbanMind proposing a framework for urban intelligence with multilevel optimization.

  • Foundation Models for Tabular Data: The Real-TabPFN shows that targeted continued pre-training on real-world datasets can significantly boost the performance of foundation models for tabular data, outperforming models trained on broader, potentially noisier datasets.

  • Efficiency-Focused Techniques: Researchers are developing resourceful methods that maintain performance without expensive computations, like logit reweighting for topic-focused summarization and strategic querying for privacy-preserving personalization.

🔮 Future Research Directions

Based on current trends, we expect to see the following developments in the near future:

  • Explainable RAG Systems: Following the source attribution work in RAG systems, we can expect more research into making complex retrieval systems transparent and explainable for users.

  • Cross-Domain and Cross-Modal Fusion: The promising performance of vision-language and code-specialized LLMs in retrieval tasks points toward unified retrievers capable of handling text, code, images, and multimodal content.

  • Data-Centric Synthetic Generation: As shown by work on synthetic relational tabular data, we'll likely see more sophisticated approaches to generating high-quality synthetic data for pre-training foundation models in specialized domains.

This week highlights how researchers are making AI more efficient, explainable, and applicable to specialized domains. Look out for more developments in RAG systems, tabular foundation models, and privacy-preserving AI techniques in the coming weeks.

Until next week,

Archi 🧑🏽‍🔬

Want to make it better?

Feel free to tweak, build on, or completely reconfigure this workflow. If you come up with something cool, let us know and we might just share it with our community! 💚