防城港市网站建设_网站建设公司_JSON_seo优化
2025/12/23 16:41:48 网站建设 项目流程

您的需求很明确:在人脸向量搜索结果中,根据人物名称(actor_name)进行聚合,确保每个演员只返回最相似的一张图片。以下是优化后的搜索方案:

解决方案

方法一:使用聚合功能(推荐)

GET/face_search_test_index/_search{"size":0,"aggs":{"actors":{"terms":{"field":"actor_name.keyword","size":10},"aggs":{"top_hit":{"top_hits":{"size":1,"sort":[{"_score":{"order":"desc"}}],"_source":{"includes":["actor_name","actor_id","image_path"]}}}}}},"query":{"script_score":{"query":{"match_all":{}},"script":{"source":""" doublecosineSimilarity(double[]a,double[]b){double dotProduct=0.0;double normA=0.0;double normB=0.0;for(int i=0;i<a.length;i++){dotProduct+=a[i]*b[i];normA+=a[i]*a[i];normB+=b[i]*b[i];}returndotProduct/(Math.sqrt(normA)*Math.sqrt(normB));}double similarity=cosineSimilarity(params.query_vector,doc['face_vector']);returnsimilarity;""","params":{"query_vector":[-0.03294802084565163,0.08574431389570236,0.04574661701917648,-0.03050283156335354,-0.06638835370540619,-0.04965103417634964,0.06007932499051094,-0.17975950241088867,0.15759551525115967,-0.15764901041984558,0.2375360131263733,-0.028777025640010834,-0.25363847613334656,-0.10428159683942795,0.03292582929134369,0.13914231956005096,-0.10663023591041565,-0.14371007680892944,-0.16282042860984802,-0.15108194947242737,0.07730557024478912,0.06031457334756851,-0.013631811365485191,-0.03235295042395592,-0.1441582888364792,-0.22718726098537445,-0.09331541508436203,-0.03909553587436676,0.02703486755490303,-0.06274473667144775,0.10269181430339813,0.09461987763643265,-0.23343020677566528,-0.09261009097099304,0.0035921353846788406,0.036539267748594284,-0.08004020154476166,-0.02440868318080902,0.15535330772399902,0.02334958128631115,-0.13954328000545502,0.00402874918654561,-0.028272267431020737,0.219803124666214,0.1736876219511032,-0.015488659963011742,0.01637592911720276,-0.14725874364376068,0.12355087697505951,-0.24118836224079132,0.005113713443279266,0.1290103644132614,0.10923261940479279,0.12515988945960999,0.05419561266899109,-0.09587032347917557,0.07226473093032837,0.12341351062059402,-0.18840639293193817,0.06482309103012085,0.08653104305267334,-0.24938331544399261,-0.019747208803892136,-0.020754650235176086,0.09277407824993134,0.014888722449541092,-0.009055963717401028,-0.12095478177070618,0.21467812359333038,-0.11203934997320175,-0.11639110743999481,0.052483949810266495,-0.0846707746386528,-0.14047497510910034,-0.35461661219596863,-0.004548290744423866,0.3914399743080139,0.10970800369977951,-0.18320702016353607,0.016537625342607498,-0.0756637454032898,0.04670151695609093,0.09940724074840546,0.022470811381936073,-0.01574861630797386,-0.103361114859581,-0.11547031253576279,0.08255285769701004,0.21767574548721313,-0.11139923334121704,-0.06307178735733032,0.22926700115203857,-0.008651845157146454,0.02907070517539978,0.05055316537618637,0.030209437012672424,-0.0844748467206955,0.01178983598947525,-0.07661733776330948,-0.015207079239189625,0.05650537833571434,-0.09215937554836273,0.02020958811044693,0.08764813095331192,-0.12565577030181885,0.24529917538166046,-0.01250448264181614,0.013047449290752411,-0.040126584470272064,-0.07195307314395905,-0.052720751613378525,0.04307105392217636,0.14030040800571442,-0.17923066020011902,0.26793503761291504,0.1835155487060547,-0.06833052635192871,0.15156754851341248,0.02274666726589203,0.08952273428440094,-0.07919059693813324,-0.07516366243362427,-0.18298988044261932,-0.12182115018367767,0.04411851987242699,0.05293841287493706,0.03631458058953285,0.04240144044160843]}}}}}

方法二:使用k-NN搜索 + 客户端聚合

如果您更倾向于使用k-NN搜索,可以先获取结果,然后在客户端进行聚合:

# 执行k-NN搜索response=es.search(index="face_search_test_index",body={"knn":{"field":"face_vector","query_vector":your_vector,"k":100,# 获取足够多的结果以便聚合"num_candidates":1000},"_source":["actor_name","actor_id","image_path"]})# 客户端聚合fromcollectionsimportdefaultdict# 按演员名称分组actor_groups=defaultdict(list)forhitinresponse["hits"]["hits"]:actor_name=hit["_source"]["actor_name"]actor_groups[actor_name].append({"score":hit["_score"],"actor_id":hit["_source"]["actor_id"],"image_path":hit["_source"]["image_path"]})# 每个演员只保留分数最高的结果top_results=[]foractor_name,resultsinactor_groups.items():# 按分数降序排序,取第一个top_result=sorted(results,key=lambdax:x["score"],reverse=True)[0]top_results.append(top_result)# 按分数排序最终结果top_results=sorted(top_results,key=lambdax:x["score"],reverse=True)# 输出结果forresultintop_results:print(f"Actor:{actor_name}, Score:{result['score']}, ID:{result['actor_id']}")

注意事项

  1. 字段类型:确保actor_name字段在映射中设置为keyword类型,以便正确聚合
  2. 性能考虑:方法一(服务器端聚合)通常更高效,特别是当数据量大时
  3. 相似度阈值:您可能需要设置一个相似度阈值,过滤掉不够相似的结果
  4. 向量维度:确保查询向量的维度与索引中向量的维度一致

推荐方案

对于您的用例,我推荐使用方法一(服务器端聚合),因为它更高效且能减少网络传输量。如果您需要进一步优化性能,可以考虑增加shard_size参数来确保每个分片返回足够的结果供聚合使用。

希望这能解决您的问题!如果您有任何其他疑问,请随时提问。

需要专业的网站建设服务?

联系我们获取免费的网站建设咨询和方案报价,让我们帮助您实现业务目标

立即咨询