Bio Sequence

No sequence will be left unexplored with Patsnap Bio!

17 July 2023
4 min read

In the field of biotechnology, biological sequences are the core elements of innovation, and traditional keyword search methods might overlook crucial information, thereby increasing risk. Therefore, sequence information search is often adopted in the patent field for patent freedom-to-operate (FTO) and novelty search tasks.

The current search methods mostly rely on homology sequence alignment algorithms, searching similar sequences in sequence databases to ensure comprehensive results. However, there exists a peculiar kind of sequence in patents, which is termed as degenerate sequences.

Degenerate sequence explanation: Patent drafters use a method similar to describing chemical structures, introducing degenerate symbols, wildcards, and operators into the sequence, and describing the specific parameters of these symbols through interpretative documents. The generic sequence doesn't have biological significance, it's mainly used to extend the scope of patent protection and set up search barriers. Traditional sequence homology comparison algorithms have not taken into account the situation of these generic sequences, so there is a risk of omission when using traditional algorithms for search, and it is impossible to find all potential target sequences.

According to the statistical data from Patsnap Bio Sequence Database, the number of these special generic sequences is not low among global patent literature: about 7.4 million for nucleic acids, accounting for 7.12% of the total nucleic acids; and 1.31 million protein sequences, accounting for 7.55% of the total protein sequences. This shows that a large number of generic sequences, due to the presence of special symbols, will impact our search results, posing a high risk for sequence FTO.

For example, when querying the sequence:

"EVGSYPAPSDACPSDYFYCDASGRSAGGGGTENLYFQGSGGS",

it would match the target sequence as:

"EVGSYXXXXX XCXXXXXXCX XSGRSAGGGG TENLYFQGSG GS".

When using traditional sequence-based search methods, the similarity score using the BLAST algorithm is only 67%, but in reality, the similarity is 100%. Searching for this type of biological sequence through conventional algorithms can lead to two situations: either the sequence cannot be found or it is excluded from the results due to the similarity being below the threshold. In either case, it brings inconvenience to sequence search personnel, as they cannot easily compare homology with patent claims and may even miss key sequence information.

Patsnap Bio's Solution

In order to address the risk of missed detection brought by degenerate sequences, the Patsnap Engineering team utilizes its self-developed NLP, CV, entity recognition, anaphora resolution and other technologies to construct a deep learning model. It is used to identify and parse generic sequences and their substitution information in sequence lists and full-text patents, establishing a degenerate sequence search database.

This degenerate sequence search, through a special sequence comparison algorithm, not only can search for such sequences in generic sequence retrieval, but also returns the true similarity. Patsnap's generic sequence retrieval solution further reduces the risk of missing checks in patent FTO and novelty check work.

图形用户界面, 应用程序, Teams
    描述已自动生成

Given the scale of possible variations within a degenerate formula sequence that can reach the order of one hundred billion, traditional sequential alignment algorithms fail to meet the demand for real-time search. Patsnap utilizes a deeply customized sequence alignment algorithm to dynamically load substitution information of the general formula sequence during the sequence search process, ensuring precise searching and controlling the retrieval time within a reasonable range. In the scanning phase, Patsnap proposes a compression algorithm to construct a seed word list for heuristic search, greatly reducing unnecessary comparisons and improving search efficiency. When comparing the query sequence and the target sequence, Patsnap's proprietary algorithm introduces general formula substitution information, making alignment and query results more precise and comparison results more intuitive, directly showcasing the best comparative results under different variations between the query sequence and the target sequence.

Experience Degenerate Sequence Searching Now

In June of 2023, Patsnap’s biological sequence Bio database introduced a powerful degenerate sequence search feature, causing a paradigm shift in the patent domain. This disruptive advancement provides researchers with an immensely robust tool that offers an extensive collection of degenerate sequences, allowing users to effortlessly obtain the most accurate and relevant information in their searches.

To schedule a demo or learn more, visit patsnap.com/solutions/bio.

The GLP-1R plus strategy for curbing the obesity epidemic
Hot Spotlight
5 min read
The GLP-1R plus strategy for curbing the obesity epidemic
14 July 2023
this approach brings new hope that we can finally address the growing concern of obesity worldwide.
Read →
Cdc42 provides a new target for treating Alzheimer’s Disease
Cdc42 provides a new target for treating Alzheimer’s Disease
13 July 2023
the small G protein family member, Cdc42, plays an important role in the progression of Alzheimer's disease.
Read →
Organoid-On-Chips: A Potential Game Changer for Clinical Studies
Advanced Tech.
5 min read
Organoid-On-Chips: A Potential Game Changer for Clinical Studies
12 July 2023
HRS-1893 works through a special mechanism that inhibits myocardial excessive contraction and is intended for the treatment of hypertrophic cardiomyopathy and heart failure caused by myocardial hypertrophy.
Read →
Discovery of a New Drug for Inflammatory and Autoimmune Diseases: TAK-279
Drug Insights
4 min read
Discovery of a New Drug for Inflammatory and Autoimmune Diseases: TAK-279
12 July 2023
TAK-279, which is a highly selective inhibitor of TYK2 enzymatic activity.
Read →
Get started for free today!
Accelerate Strategic R&D decision making with Synapse, PatSnap’s AI-powered Connected Innovation Intelligence Platform Built for Life Sciences Professionals.
Start your data trial now!
Synapse data is also accessible to external entities via APIs or data packages. Leverages most recent intelligence information, enabling fullest potential.