Back to Templates

Daily RAG Research Paper Hub with arXiv, Gemini AI, and Notion

Created by

Created by: dongou || dongou

dongou

Last update

Last update a day ago

Share


Fetch user-specific research papers from arXiv on a daily schedule, process and structure the data, and create or update entries in a Notion database, with support for data delivery

  • Paper Topic: single query keyword
  • Update Frequency: Daily updates, with fewer than 20 entries expected per day
  • Tools:
    • Platform: n8n, for end-to-end workflow configuration
    • AI Model: Gemini-2.5-Flash, for daily paper summarization and data processing
    • Database: Notion, with two tables — Daily Paper Summary and Paper Details
    • Message: Feishu (IM bot notifications), Gmail (email notifications)

1. Data Retrieval

arXiv API

The arXiv provides a public API that allows users to query research papers by topic or by predefined categories.

arXiv API User Manual

Key Notes:

  1. Response Format: The API returns data as a typical Atom Response.
  2. Timezone & Update Frequency:
    • The arXiv submission process operates on a 24-hour cycle.
    • Newly submitted articles become available in the API only at midnight after they have been processed.
    • Feeds are updated daily at midnight Eastern Standard Time (EST).
    • Therefore, a single request per day is sufficient.
  3. Request Limits:
    • The maximum number of results per call (max_results) is 30,000,
    • Results must be retrieved in slices of at most 2,000 at a time, using the max_results and start query parameters.
  4. Time Format:
    • The expected format is [YYYYMMDDTTTT+TO+YYYYMMDDTTTT],
    • TTTT is provided in 24-hour time to the minute, in GMT.

Scheduled Task

  • Execution Frequency: Daily
  • Execution Time: 6:00 AM
  • Time Parameter Handling (JS):
    According to arXiv’s update rules, the scheduled task should query the previous day’s (T-1) submittedDate data.

2. Data Extraction

Data Cleaning Rules (Convert to Standard JSON)

  1. Remove Header

    • Keep only the 【entry】【/entry】 blocks representing paper items.
  2. Single Item

    • Each 【entry】【/entry】 represents a single item.
  3. Field Processing Rules

    • 【id】【/id】 ➡️ id
      Extract content.
      Example:
      【id】http://arxiv.org/abs/2409.06062v1【/id】http://arxiv.org/abs/2409.06062v1

    • 【updated】【/updated】 ➡️ updated
      Convert timestamp to yyyy-mm-dd hh:mm:ss

    • 【published】【/published】 ➡️ published
      Convert timestamp to yyyy-mm-dd hh:mm:ss

    • 【title】【/title】 ➡️ title
      Extract text content

    • 【summary】【/summary】 ➡️ summary
      Keep text, remove line breaks

    • 【author】【/author】 ➡️ author
      Combine all authors into an array
      Example: [ "Ernest Pusateri", "Anmol Walia" ] (for Notion multi-select field)

    • 【arxiv:comment】【/arxiv:comment】 ➡️ Ignore / discard

    • 【link type="text/html"】 ➡️ html_url
      Extract URL

    • 【link type="application/pdf"】 ➡️ pdf_url
      Extract URL

    • 【arxiv:primary_category term="cs.CL"】 ➡️ primary_category
      Extract term value

    • 【category】 ➡️ category
      Merge all 【category】 values into an array
      Example: [ "eess.AS", "cs.SD" ] (for Notion multi-select field)

  4. Add Empty Fields

    • github
    • huggingface

3. Data Processing

Analyze and summarize paper data using AI, then standardize output as JSON.

  • Single Paper Basic Information Analysis and Enhancement
  • Daily Paper Summary and Multilingual Translation

4. Data Storage: Notion Database

  • Create a corresponding database in Notion with the same predefined field names.
  • In Notion, create an integration under Integrations and grant access to the database. Obtain the corresponding Secret Key.
  • Use the Notion "Create a database page" node to configure the field mapping and store the data.

Notes

  • "Create a database page" only adds new entries; data will not be updated.
  • The updated and published timestamps of arXiv papers are in UTC.
  • Notion single-select and multi-select fields only accept arrays. They do not automatically parse comma-separated strings. You need to format them as proper arrays.
  • Notion does not accept null values, which causes a 400 error.

5. Data Delivery

Set up two channels for message delivery: EMAIL and IM, and define the message format and content.

Email: Gmail

GMAIL OAuth 2.0 – Official Documentation
Configure your OAuth consent screen

Steps:

  • Enable Gmail API
  • Create OAuth consent screen
  • Create OAuth client credentials
  • Audience: Add Test users under Testing status

Message format: HTML
(Model: OpenAI GPT — used to design an HTML email template)

IM: Feishu (LARK)

Bots in groups
Use bots in groups