Finance Data Agent

🤖 Automated Quantitative Trading & Iterative Factors Evolution

📖 Background

In the dynamic world of quantitative trading, factors serve as the strategic tools that enable traders to exploit market inefficiencies. These factors—ranging from simple metrics like price-to-earnings ratios to complex models like discounted cash flows—are the key to predicting stock prices with a high degree of accuracy.

By leveraging these factors, quantitative traders can develop sophisticated strategies that not only identify market patterns but also significantly enhance trading efficiency and precision. The ability to systematically analyze and apply these factors is what separates ordinary trading from truly strategic market outmaneuvering. And this is where the Finance Model Agent comes into play.

🎥 Demo

🌟 Introduction

In this scenario, our agent illustrates the iterative process of hypothesis generation, knowledge construction, and decision-making.

It highlights how financial factors evolve through continuous feedback and refinement.

Here’s an enhanced outline of the steps:

Step 1 : Hypothesis Generation 🔍

  • Generate and propose initial hypotheses based on previous experiment analysis and domain expertise, with thorough reasoning and financial justification.

Step 2 : Factor Creation ✨

  • Based on the hypothesis, divide the tasks.

  • Each task involves developing, defining, and implementing a new financial factor, including its name, description, formulation, and variables.

Step 3 : Factor Implementation 👨‍💻

  • Implement the factor code based on the description, evolving it as a developer would.

  • Quantitatively validate the newly created factors.

Step 4 : Backtesting with Qlib 📉

  • Integrate the full dataset into the factor implementation code and prepare the factor library.

  • Conduct backtesting using the Alpha158 plus newly developed factors and LGBModel in Qlib to evaluate the new factors’ effectiveness and performance.

Dataset

Model

Factors

Data Split

CSI300

LGBModel

Alpha158 Plus

Train

2008-01-01 to 2014-12-31

Valid

2015-01-01 to 2016-12-31

Test

2017-01-01 to 2020-08-01

Step 5 : Feedback Analysis 🔍

  • Analyze backtest results to assess performance.

  • Incorporate feedback to refine hypotheses and improve the model.

Step 6 :Hypothesis Refinement ♻️

  • Refine hypotheses based on feedback from backtesting.

  • Repeat the process to continuously improve the model.

⚡ Quick Start

Please refer to the installation part in Installation and Configuration to prepare your system dependency.

You can try our demo by running the following command:

  • 🐍 Create a Conda Environment

    • Create a new conda environment with Python (3.10 and 3.11 are well tested in our CI):

      conda create -n rdagent python=3.10
      
    • Activate the environment:

      conda activate rdagent
      
  • 📦 Install the RDAgent

    • You can install the RDAgent package from PyPI:

      pip install rdagent
      
  • 🚀 Run the Application

    • You can directly run the application by using the following command:

      rdagent fin_factor
      

🛠️ Usage of modules

  • Env Config

The following environment variables can be set in the .env file to customize the application’s behavior:

pydantic settings rdagent.app.qlib_rd_loop.conf.FactorBasePropSetting

Show JSON schema
{
   "title": "FactorBasePropSetting",
   "type": "object",
   "properties": {
      "scen": {
         "default": "rdagent.scenarios.qlib.experiment.factor_experiment.QlibFactorScenario",
         "title": "Scen",
         "type": "string"
      },
      "knowledge_base": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Knowledge Base"
      },
      "knowledge_base_path": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Knowledge Base Path"
      },
      "hypothesis_gen": {
         "default": "rdagent.scenarios.qlib.proposal.factor_proposal.QlibFactorHypothesisGen",
         "title": "Hypothesis Gen",
         "type": "string"
      },
      "interactor": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Interactor"
      },
      "hypothesis2experiment": {
         "default": "rdagent.scenarios.qlib.proposal.factor_proposal.QlibFactorHypothesis2Experiment",
         "title": "Hypothesis2Experiment",
         "type": "string"
      },
      "coder": {
         "default": "rdagent.scenarios.qlib.developer.factor_coder.QlibFactorCoSTEER",
         "title": "Coder",
         "type": "string"
      },
      "runner": {
         "default": "rdagent.scenarios.qlib.developer.factor_runner.QlibFactorRunner",
         "title": "Runner",
         "type": "string"
      },
      "summarizer": {
         "default": "rdagent.scenarios.qlib.developer.feedback.QlibFactorExperiment2Feedback",
         "title": "Summarizer",
         "type": "string"
      },
      "evolving_n": {
         "default": 10,
         "title": "Evolving N",
         "type": "integer"
      },
      "train_start": {
         "default": "2008-01-01",
         "title": "Train Start",
         "type": "string"
      },
      "train_end": {
         "default": "2014-12-31",
         "title": "Train End",
         "type": "string"
      },
      "valid_start": {
         "default": "2015-01-01",
         "title": "Valid Start",
         "type": "string"
      },
      "valid_end": {
         "default": "2016-12-31",
         "title": "Valid End",
         "type": "string"
      },
      "test_start": {
         "default": "2017-01-01",
         "title": "Test Start",
         "type": "string"
      },
      "test_end": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": "2020-08-01",
         "title": "Test End"
      }
   },
   "additionalProperties": false
}

Config:
  • env_prefix: str = QLIB_FACTOR_

  • protected_namespaces: tuple = ()

field coder: str = 'rdagent.scenarios.qlib.developer.factor_coder.QlibFactorCoSTEER'

Coder class

field evolving_n: int = 10

Number of evolutions

field hypothesis2experiment: str = 'rdagent.scenarios.qlib.proposal.factor_proposal.QlibFactorHypothesis2Experiment'

Hypothesis to experiment class

field hypothesis_gen: str = 'rdagent.scenarios.qlib.proposal.factor_proposal.QlibFactorHypothesisGen'

Hypothesis generation class

field runner: str = 'rdagent.scenarios.qlib.developer.factor_runner.QlibFactorRunner'

Runner class

field scen: str = 'rdagent.scenarios.qlib.experiment.factor_experiment.QlibFactorScenario'

Scenario class for Qlib Factor

field summarizer: str = 'rdagent.scenarios.qlib.developer.feedback.QlibFactorExperiment2Feedback'

Summarizer class

field test_end: str | None = '2020-08-01'

End date of the test / backtest segment

field test_start: str = '2017-01-01'

Start date of the test / backtest segment

field train_end: str = '2014-12-31'

End date of the training segment

field train_start: str = '2008-01-01'

Start date of the training segment

field valid_end: str = '2016-12-31'

End date of the validation segment

field valid_start: str = '2015-01-01'

Start date of the validation segment

pydantic settings rdagent.components.coder.factor_coder.config.FactorCoSTEERSettings

Show JSON schema
{
   "title": "FactorCoSTEERSettings",
   "type": "object",
   "properties": {
      "coder_use_cache": {
         "default": false,
         "title": "Coder Use Cache",
         "type": "boolean"
      },
      "max_loop": {
         "default": 10,
         "title": "Max Loop",
         "type": "integer"
      },
      "fail_task_trial_limit": {
         "default": 20,
         "title": "Fail Task Trial Limit",
         "type": "integer"
      },
      "v1_query_former_trace_limit": {
         "default": 3,
         "title": "V1 Query Former Trace Limit",
         "type": "integer"
      },
      "v1_query_similar_success_limit": {
         "default": 3,
         "title": "V1 Query Similar Success Limit",
         "type": "integer"
      },
      "v2_query_component_limit": {
         "default": 1,
         "title": "V2 Query Component Limit",
         "type": "integer"
      },
      "v2_query_error_limit": {
         "default": 1,
         "title": "V2 Query Error Limit",
         "type": "integer"
      },
      "v2_query_former_trace_limit": {
         "default": 3,
         "title": "V2 Query Former Trace Limit",
         "type": "integer"
      },
      "v2_add_fail_attempt_to_latest_successful_execution": {
         "default": false,
         "title": "V2 Add Fail Attempt To Latest Successful Execution",
         "type": "boolean"
      },
      "v2_error_summary": {
         "default": false,
         "title": "V2 Error Summary",
         "type": "boolean"
      },
      "v2_knowledge_sampler": {
         "default": 1.0,
         "title": "V2 Knowledge Sampler",
         "type": "number"
      },
      "knowledge_base_path": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Knowledge Base Path"
      },
      "new_knowledge_base_path": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "New Knowledge Base Path"
      },
      "enable_filelock": {
         "default": false,
         "title": "Enable Filelock",
         "type": "boolean"
      },
      "filelock_path": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Filelock Path"
      },
      "max_seconds_multiplier": {
         "default": 1000000,
         "title": "Max Seconds Multiplier",
         "type": "integer"
      },
      "data_folder": {
         "default": "git_ignore_folder/factor_implementation_source_data",
         "title": "Data Folder",
         "type": "string"
      },
      "data_folder_debug": {
         "default": "git_ignore_folder/factor_implementation_source_data_debug",
         "title": "Data Folder Debug",
         "type": "string"
      },
      "simple_background": {
         "default": false,
         "title": "Simple Background",
         "type": "boolean"
      },
      "file_based_execution_timeout": {
         "default": 3600,
         "title": "File Based Execution Timeout",
         "type": "integer"
      },
      "select_method": {
         "default": "random",
         "title": "Select Method",
         "type": "string"
      },
      "python_bin": {
         "default": "python",
         "title": "Python Bin",
         "type": "string"
      }
   },
   "additionalProperties": false
}

Config:
  • env_prefix: str = FACTOR_CoSTEER_

field data_folder: str = 'git_ignore_folder/factor_implementation_source_data'

Path to the folder containing financial data (default is fundamental data in Qlib)

field data_folder_debug: str = 'git_ignore_folder/factor_implementation_source_data_debug'

Path to the folder containing partial financial data (for debugging)

field file_based_execution_timeout: int = 3600

Timeout in seconds for each factor implementation execution

field python_bin: str = 'python'

Path to the Python binary

field select_method: str = 'random'

Method for the selection of factors implementation

field simple_background: bool = False

Whether to use simple background information for code feedback